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FINAL  REPORT 


Anthropomorphic  Interfaces  on  Automation  Trust,  Dependence,  and  Performance  in  younger  and 
Older  Adults 

Richard  Pak  PhD 
Clemson  University 


Executive  Summary 

This  proposal  sought  to  better  understand  the  psychological  component  of  human-automation 
interaction  with  a  focus  on  understanding  what  makes  automation  seem  “trustable”.  Specifically, 
we  will  investigate  the  role  of  anthropomorphic  automation  on  operator’s  trust,  dependence,  and 
performance  with  automation.  Evidence  from  the  literature  and  our  own  recently  collected  data 
suggests  that  the  design  of  automation  can  affect  how  operators  perceive  the  automation  and 
their  likelihood  of  using  it.  We  seek  to  investigate  the  conditions  under  which 
anthropomorphized  automation,  or  automation  that  appears  to  possess  human-like 
characteristics,  affects  the  calibration  of  trust  between  the  operator  and  the  system.  A  secondary 
goal  is  to  understand  how  anthropomorphic  automation  effects  are  moderated  by  the  age  of  the 
operator.  Older  users  have  different  reactions  to  automation  (some  research  shows  over-trust 
while  other  research  shows  under-trust). 

The  general  goal  this  project  was  to  examine  how  extensive  use  of  social  responses  deliberately 
engendered  by  anthropomorphic  agents  could  convey  to  operators  the  “trustability”  of 
automation  and  how  this  is  affected  by  operator  characteristics.  Given  some  of  the  observed 
effects  of  minimal  anthropomorphism  (our  study;  Parasuraman  &  Miller,  2004)  what  are  the 
critical  factors  that  must  be  manipulated  to  affect  perceptions  of  trust  and  dependence?  Under 
what  conditions  do  we  observe  effects?  Ultimately,  the  goal  was  to  encourage  proper  human- 
automation  calibration  such  that  the  user  relies  on  the  automation  when  he  should  but  does  not 
when  he  should  not. 

The  project’s  three  specific  aims  along  with  research  products  or  student  theses  associated  with 
each  aim  are  below  (and  can  be  found  in  the  appendix): 

Aim  1:  Clarify  how  automation  appearance,  task  type,  and  operator  characteristics  affect 
trust  in  automation 

Publications: 

•  Pak,  R.,  McEaughlin.  A.  C.,  &  Bass,  B.  (2014).  A  Multi-level  Analysis  of  the  Effects  of  Age 
and  Gender  Stereotypes  on  Trust  in  Anthropomorphic  Technology  by  Younger  and  Older 
Adults.  Ergonomics. 

•  Rovira,  E.,  Pak,  R.,  &  McEaughlin,  A.  C.  (under  review).  Eow  Memory,  Mo'  Problems: 
Effects  of  individual  differences  on  types  and  levels  of  automation.  Human  Factors. 
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Conference  Proceedings 

•  Bass,  B.  M.,  Goodwin,  M.,  Brennan,  K.,  Pak,  R.,  &  McLaughlin,  A.  C.  (2013).  Effects  of 
age  and  gender  stereotypes  on  trust  in  an  anthropomorphic  decision  aid.  Proceedings  of  the 
Human  Factors  and  Ergonomics  Society  Annual  Meeting,  57(1),  1575-1579. 

•  Leidheiser,  W.,  &  Pak,  R.  (2014).  The  Effects  of  Age  and  Working  Memory  Demands  on 
Automation-Induced  Complacency.  Proceedings  of  the  Human  Factors  and  Ergonomics 
Society  Annual  Meeting,  5S(1),  1919-1923.  doi:  10.1 177/1541931214581401 

Student  Thesis: 

•  Eeidheiser,  W.  (in  progress).  The  Effects  of  Age  and  Working  Memory  Demands  on 
Automation-Induced  Complacency. 

Aim  2:  Determine  if  emotional  expression  can  assist  in  optimal  human-automation 

calibration 

Student  Thesis: 

•  Bass,  B.  (2014).  Eaces  as  Ambient  Displays:  Assessing  the  Attention-Demanding 
Characteristics  of  Eacial  Expressions.  Unpublished  master’s  thesis.  Available  at: 
http://tigerprints.clemson.edu/all_theses/1941/ 

Conference  Proceedings: 

•  Bass,  B.  M.,  &  Pak,  R.  (2012).  Paces  as  Ambient  Displays:  Assessing  the  attention¬ 
demanding  characteristics  of  facial  expressions.  Proceedings  of  the  Human  Factors  and 
Ergonomics  Society  Annual  Meeting,  56(1),  2142-2146. 

Aim  3:  Investigate  how  anthropomorphically  designed  automation  affects  automation 

error  attributions 

Student  Thesis: 

•  Branyon,  J.  (in  progress).  Investigating  older  adults'  trust,  causal  attributions,  and 
perception  of  capabilities  in  robots  as  a  function  of  robot  appearance,  task,  and  reliability. 

Conference  Poster: 

•  Branyon,  J.  J.,  &  Pak,  R.  (2015).  Investigating  older  adults’  trust,  attributions,  and  capability 
perceptions  of  robots.  Presented  at  the  American  Psychological  Association  123rd  Annual 
Meeting.  Toronto,  ON:  American  Psychological  Association 
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This  report  is  organized  around  the  three  aims.  In  the  course  of  this  project,  research  efforts 
toward  Aim  1  were  expanded  to  include  the  influence  of  individual  differences  (study  2).  Aim  3 
was  modified  to  examine  the  research  question  in  the  context  of  human-robot  interaction. 
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Aim  1:  Clarify  how  automation  appearance,  task  type,  and  operator  characteristics  affect 
trust  in  automation 

Study  1:  The  effect  of  agent  age  and  gender  on  trust  in  anthropomorphic  automation  in  younger 
and  older  adults 

Executive  summary 

Previous  research  has  shown  that  gender  stereotypes,  elicited  by  the  appearance  of  the 
anthropomorphic  technology,  can  alter  perceptions  of  system  reliability.  The  current  study 
examined  whether  stereotypes  about  the  perceived  age  and  gender  of  anthropomorphic 
technology  interacted  with  reliability  to  affect  trust  in  such  technology.  Participants  included  a 
cross-section  of  younger  and  older  adults.  Through  a  factorial  survey,  participants  responded  to 
health-related  vignettes  containing  anthropomorphic  technology  with  a  specific  age,  gender,  and 
level  of  past  reliability  by  rating  their  trust  in  the  system.  Trust  in  the  technology  was  affected 
by  the  age  and  gender  of  the  user  as  well  as  its  appearance  and  reliability.  Perceptions  of 
anthropomorphic  technology  can  be  affected  by  pre-existing  stereotypes  about  the  capability  of  a 
specific  age  or  gender. 

Introduction 

Interactive  computer  systems  that  exhibit  human-like,  or  anthropomorphic,  traits  can  lead  users 
to  perceive  and  treat  them  differently  than  non-human-like  systems  (Nass,  Steuer,  &  Tauber, 
1994).  Thus  it  is  imperative  to  understand  how  users’  perceptions  of  the  system  might  be 
affected  by  their  social  reactions  to  anthropomorphic  technology.  One  way  in  which  a  system 
may  elicit  social  reactions  is  by  eliciting  stereotypes  (Yee,  Bailenson,  &  Rickerson,  2007). 

Stereotypes  are  preconceptions  about  the  traits,  behaviour,  or  abilities  of  a  group  and  can  set 
expectations  of  a  stereotyped  individual.  Stereotypes  can  have  both  negative  and  positive 
connotations  that  may  be  inconsistent  with  real  group  attributes  but  provide  adaptive  value 
because  they  filter  and  organize  incoming  information,  thereby  easing  processing  and 
interpretation  (Hilton  &  von  Hippel,  1996).  Stereotypes  can  be  activated  and  applied  with  or 
without  conscious  awareness  (Greenwald  &  Banaji,  1995;  Banaji,  Hardin,  &  Rothman,  1993). 
Unfortunately,  when  the  stereotype  is  highly  simplified  or  inaccurate,  it  can  lead  to  errors  in 
perceptions  and  behavior. 

Stereotype  activation  for  computerized  agents  can  also  interact  with  individual  differences,  such 
as  physical  characteristics.  Qiu  and  Benbasat  (2009)  found  that  an  anthropomorphic  decision  aid 
significantly  increased  perceptions  of  social  presence  and  led  to  increased  trust  of  the  agent.  The 
strength  of  these  effects  was  influenced  by  the  degree  to  which  the  decision  aid  agent  was  similar 
to  the  user  on  a  visible  factor,  such  as  ethnicity.  The  link  between  trust  and  apparent  physical 
characteristics  was  explained  via  similarity-attraction  theory  that  predicted  that  people  would  be 
more  attracted  to  those  similar  to  them  (Byrne,  1971).  The  user  may  have  attributed  their 
attraction  to  a  similar  ethnicity  as  trustworthiness  of  the  agent. 

In  another  example  of  the  moderating  role  of  individual  differences  in  susceptibility  to 
anthropomorphic  effects,  susceptibility  to  flattery  (insincere  praise)  depended  on  the  level  of 
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computer  experience  of  the  user  (Johnson,  Gardner,  &  Wiles,  2004).  Johnson,  Gardner,  and 
Wiles  found  that  susceptibility  to  flattery  from  a  computer  depended  on  the  user’s  experience 
level  with  computers  -  the  judgments  of  highly  experienced  users  were  more  affected  by  flattery 
than  less  experienced  users.  Further,  Lee  (2010)  found  that  people  who  exhibited  less  analytical 
and  more  intuitive  cognitive  style  were  more  susceptible  to  flattery  from  a  computer. 

In  sum,  stereotypes  can  affect  user  perceptions  of  a  computer  or  automated  aid  and  can  be 
moderated  by  individual  differences.  Some  of  the  aids  described  in  the  previous  studies  were 
forms  of  automation  that  functioned  in  a  decision-support  capacity;  thus  some  automation  bias 
may  be  based  on  stereotypes  (Skitka,  Mosier,  &  Burdick,  1999).  However,  no  research  has 
explicitly  examined  how  these  factors  might  interact  with  machine-related  factors  of  automation, 
such  as  reliability  of  the  automation,  or  how  various  activated  stereotypes  might  interact  (e.g., 
age  and  gender). 

Using  participants  in  younger  and  older  adult  age  groups,  we  collected  judgments  of  trust  of  a 
simulated  agent  embedded  within  a  decision  aid  that  varied  in  gender,  age,  and  reliability  using  a 
factorial  survey  with  concrete  health-related  vignettes.  Following  the  social  cognition  literature, 
we  expected  that  age  and  gender  stereotypes  would  most  affect  trust  in  the  decision  aid  when 
system  performance  was  ambiguous,  but  that  there  would  be  different  effects  for  different  age 
groups  and  genders  of  users.  Specific  research  aims  were:  1)  Determine  the  amount  of  variance 
in  trust  due  to  within-person  variation  compared  to  between-person  variation,  2)  Determine  how 
age  of  the  agent,  gender  of  the  agent,  and  reliability  of  the  decision  aid  agent  affected  judgments 
of  trust  in  the  aid,  and  3)  Determine  how  individual  differences  such  as  age  and  gender  of  the 
participant  affected  trust  ratings  of  various  decision  aids.  The  results  informed  basic  knowledge 
of  how  differing  age  and  gender  groups  responded  to  stereotypes  as  well  as  informing  the  design 
of  decision  aids  targeting  particular  groups  of  users. 

We  presented  scenarios  involving  a  decision  aid  (a  smartphone  “app”)  for  diabetes  management 
via  a  factorial  survey.  The  decision  aid  contained  a  simulated  anthropomorphized  agent. 
Factorial  surveys  have  been  widely  used  to  examine  how  beliefs,  judgments,  and  decision¬ 
making  are  influenced  by  situational  factors  (Rossi  &  Anderson,  1982).  Specific  factors  of  the 
scenario  were  manipulated  (in  a  factorial  manner)  and  the  participant  rated  all  combinations  of 
factors.  The  agent  was  a  health  care  provider  offering  advice  on  a  specific  diabetes-related 
dilemma.  Because  our  dependent  variable  (trust)  was  a  social  judgment  about  a  situation,  a 
factorial  survey  was  an  ideal  way  to  measure  the  influence  of  manipulated  variables  (age,  gender, 
reliability  of  automation)  as  well  as  individual  differences  of  the  participants  (Rossi  &  Anderson, 
1982;  Hox,  Kreft,  &  Hermkens,  1991). 

Methods,  Procedure,  and  Results 

[can  be  found  in  Pak,  McLaughlin,  &  Bass  (2014)  attached  in  Appendix] 

Conclusion 

As  automation  in  consumer  products  and  systems  embodies  human-like  traits  (e.g., 
anthropomorphic  agents),  stereotypes  that  users  hold  of  age  and  gender  may  play  an  important 
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role  in  trust  and  use  of  that  automation.  Prior  research  established  that  people  apply  gender 
stereotypes  to  computers  but  the  purpose  of  this  study  was  to  examine  if  powerful  and  pervasive 
age  stereotypes,  as  well  as  gender  stereotypes,  would  be  applied  to  anthropomorphic  agents. 

The  finding  that  trust  varies  with  reliability  is  not  surprising;  with  higher  levels  of  perceived 
reliability,  users,  particularly  older  adults,  may  become  complacent  (Mouloua,  Smither, 

Vincenzi,  &  Smith,  2002;  Ho,  Wheatley,  &  Scialfa,  2005).  What  is  surprising  is  that  this 
relationship  between  trust  and  complacency  interacts  with  attributes  of  technology  and  individual 
differences  in  a  way  that  is  roughly  consistent  with  the  stereotype  literature,  specifically,  age  and 
gender  stereotypes  of  doctors.  However,  perceived  age  group  and  gender  of  the  agent  and  its 
reliability  moderated  the  application  of  stereotypes.  When  the  agent  appeared  young,  male  agents 
were  more  trusted  than  female  agents  only  when  reliability  was  low.  This  gender  difference 
disappeared  at  other  levels  of  reliability.  This  pattern  might  suggest  that  unless  the  reliability  of 
the  system  is  catastrophically  low  (45%),  most  participants  do  not  exhibit  gender  stereotypic 
thinking;  perceptions  of  trust  are  primarily  driven  by  reliability.  However,  when  the  reliability  is 
very  low,  participants  clearly  shift  to  more  stereotypic  thinking  and  seem  to  attribute  low 
performance  to  gender. 

When  the  agent  appeared  older,  male  agents  were  more  trusted  than  female  agents  only  at 
medium  levels  of  reliability.  That  is,  stereotypic  judgments  appear  at  more  moderate  levels  of 
reliability  (70%  versus  45%)  if  the  aid  is  older  rather  than  younger.  The  finding  of  gender 
stereotypic  effects  at  45%  reliability  when  the  agent  is  young  but  at  70%  when  the  agent  is  old 
seems  to  suggest  that  older  female  agents  are  judged  more  harshly  than  younger  female  agents. 
Giving  this  finding  one  design  recommendation  is  that  when  it  is  crucial  for  users  to  maintain 
high  levels  of  trust  in  imperfect  automation,  a  younger  male  agent  is  optimal  because  it  seems 
less  susceptible  to  large  fluctuations  in  perceptions  of  trust  as  a  function  of  gender  (i.e.,  gender 
stereotypic  thinking).  More  specifically,  if  it  is  undesirable  to  have  users  exhibit  gender 
differences  (or  bias)  in  trust  then  using  younger  agents  was  preferable  to  older  agents.  A  male 
agent  was  recommended  over  female  because  trust  in  female  agents  appeared  more  erratic  as  a 
function  of  reliability  compared  to  male  agents  (e.g.,  the  steep  plunge  in  trust  at  45%  reliability 
for  young  females).  However,  this  design  recommendation  does  not  take  into  account  the  gender 
or  age  group  of  the  user  as  our  results  showed  that  individual  differences  also  seem  to  interact 
with  the  agent  characteristics. 

Some  anthropomorphic  aspects  of  the  aid  did  interact  with  participant  individual  differences  to 
affect  trust.  Younger  adults  in  low  reliability  conditions  tended  to  trust  older  agents  over 
younger  agents  while  older  adults  did  not  show  any  significant  differences  in  trust  as  a  function 
of  agent  age.  Based  on  Model  3,  if  the  goal  is  to  maintain  high  levels  of  trust  in  imperfect 
automation  in  young  adult  users,  older  agents  (regardless  of  agent  gender)  are  preferred.  For 
older  adult  users,  there  was  no  significant  difference  in  trust  as  a  function  of  agent  age  group. 
However,  there  did  appear  to  be  a  trend  toward  higher  trust  of  younger  agents  with  increasing 
reliability  so  for  older  users,  a  young  agent  may  be  optimal. 

One  caveat  is  that  we  did  not  assess  a  priori  the  pre-existing  stereotypes  held  by  our  participants 
(as  such  an  assessment  might  have  influenced  their  behavior  in  the  experiment.)  However,  the 
stereotype  literature  is  replete  with  research  that  shows  the  pervasiveness  of  the  "warm  but  not 
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competent"  stereotype  of  older  adults  not  only  in  the  United  States  but  worldwide  (Cuddy, 
Norton,  &  Fiske,  2005).  Another  limitation  is  the  use  of  a  diabetes  scenario.  Although  none  of 
the  participants  in  our  study  reported  having  diabetes,  older  adults  may  be  more  aware  of 
diabetes  simply  because  it  is  more  common  in  their  cohort  than  among  younger  adults  (26.9% 
versus  1 1.3%  respectively;  American  Diabetes  Association,  201 1).  Thus,  simply  being  in  a 
cohort  that  is  more  affected  by  diabetes  may  influence  how  one  perceives  diabetes  advice. 
Another  limitation  was  that  because  we  assessed  subjective  perceptions  of  the  automation  (trust) 
because  it  is  uncertain  if  trust  translates  to  behavior.  However  past  research  has  shown  that 
perceptions  of  trust  in  automation  are  strongly  correlated  with  behavior  (e.g.,  Lee  &  Moray, 
1994). 
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Study  2:  The  effect  of  individual  differences  in  working  memory  on  trust  and  performance  with 
automation  of  varying  degrees 

Executive  summary 

We  explored  the  extent  to  which  individual  differences  in  cognitive  ability  affected  the  use  of 
types  and  levels  of  automation  support  in  a  complex  decision-making  task.  Previous  studies 
show  performance  benefits  with  reliable  automation  but  performance  costs  with  imperfect 
automation,  particularly  as  automation  support  increases.  Cognitive  abilities  are  also  critical  to 
decision-making  and  correlate  with  automation  reliance.  We  examined  decision-making 
performance  with  varying  types  and  levels  of  imperfect  automation  that  supported  86 
participants  performing  a  simulated  command  and  control  task.  Participants  also  completed  a 
spatial  working  memory  task.  Reliable  automation  with  increased  automation  support  resulted  in 
higher  accuracies.  When  automation  failed,  the  reverse  was  true:  increased  automation  support 
resulted  in  lower  accuracy,  especially  for  those  with  lower  working  memory  ability.  Those  with 
higher  working  memory  were  less  susceptible  to  the  detrimental  effects  when  seemingly 
supportive  automation  failed.  Further,  lower  working  memory  was  associated  with  more  trust  in 
automation.  These  results  confirm  the  link  between  automation  performance  and  individual 
differences,  but  also  demonstrate  the  limits  of  the  “conventional  wisdom”  that  higher,  reliable 
automation  support  unilaterally  helps  performance  while  higher,  imperfect  automation  support 
harms  performance  (cf.  Onnasch,  Wickens,  Li,  &  Manzey,  2013). 

Introduction 

A  growing  body  of  research  has  examined  how  human  performance  is  differentially  affected  by 
various  types  and  levels  of  highly  reliable  but  imperfect  automation  (Crocoll  &  Coury,  1990; 
Endsley  &  Kaber,  1999;  Galster,  Bolia,  &  Parasuraman,  2002;  Lorenz,  Di  Nocera,  Rottger,  & 
Parasuraman,  2002;  Sarter  &  Schroeder,  2001;  Wickens  &  Xu,  2002;  Rovira,  McGarry,  & 
Parasuraman,  2007;  Onnasch,  Wickens,  Li,  &  Manzey,  2014).  The  interest  is  motivated  by  the 
severe  human  performance  consequences  of  highly  reliable,  yet  imperfect  automation  such  as: 
out  of  the  loop  unfamiliarity  (Wickens,  1992),  automation  complacency  (Parasuraman,  Molloy, 
&  Singh,  1993),  loss  of  situation  awareness  (Endsley  &  Kiris,  1995),  and  skill  degradation 
(Bainbridge,  1983). 

In  a  meta-analysis  of  1 8  automation  studies  examining  the  differential  effects  of  types  and  levels 
of  automation,  Onnasch  et  al.  (2014)  found  performance  benefits  for  reliable  automation  and 
performance  decrements  after  an  automation  failure  with  decision  automation  and  increased 
levels  of  automation.  Of  most  interest  were  the  decrements  in  performance  found  when 
automation  support  moved  across  the  critical  boundary  from  information  automation  to  decision 
automation;  a  change  in  type  of  automation.  Thus,  an  important  goal  for  designers  is  to  mitigate 
performance  costs  associated  with  failures  of  decision  automation  and  failures  at  increased  levels 
of  automation  by  facilitating  appropriate  trust  calibration  (e.g.,  Rovira,  Cross,  Leitch,  & 
Bonaceto,  2014).  One  approach  is  to  better  understand  how  individual  differences  in  cognitive 
ability  affect  the  appropriate  use  of  imperfect  types  and  levels  of  automation  in  complex 
decision-making  tasks. 
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In  study  examining  automation  performance  and  individual  differences  in  cognitive  abilities, 
Chen  and  Terrence  (2009)  investigated  the  effects  of  imperfect  automation  and  individual 
differences  in  a  military  multitask  environment.  Specifically,  they  were  interested  if  individual 
differences  in  a  component  of  working  memory  capacity,  perceived  attentional  control 
(Shipstead,  Lindsey,  Marshall,  &  Engle,  2014),  impacted  how  operators  interacted  with  miss 
versus  false  alarm  prone  automation.  Attentional  control  was  assessed  using  a  subjective 
measure  of  individuals’  perceived  attentional  focus  and  shifting.  They  found  that  individuals 
with  high  perceived  attentional  control  were  more  negatively  affected  by  false  alarms,  while 
individuals  with  low  perceived  attentional  control  suffered  more  with  miss-prone  automation.  In 
the  context  of  their  task  (military  gunner  and  robotics  operator),  perceived  attentional  control 
was  an  important  moderator  of  how  operators  reacted  to  automation  false  alarms  and  misses. 

Individual  differences  in  working  memory  also  seem  to  play  a  role  in  mediating  operator 
performance  with  automation.  Parasuraman,  de  Visser,  Lin,  and  Greenwood  (2012)  examined 
whether  certain  genotypes  could  predict  an  individual’s  susceptibility  to  automation  bias 
(adhering  to  imperfect  automation).  Researchers  looked  at  two  specific  single  nucleotide 
polymorphisms  (SNPs)  or  variants  of  the  DBH  gene  that  regulate  Dopamine  (DA)  and 
norepinephrine  (NE).  DA  and  NE  levels  are  associated  with  DBH  enzyme  activity  (low,  high) 
that  contributes  to  neural  activity  in  the  prefrontal  cortex  known  to  play  a  critical  role  in  working 
memory  ability.  Using  a  command  and  control  task  (Rovira,  et  ak,  2007),  Parasuraman  et  al. 
(2012)  varied  the  automation  support  (manual,  reliable,  and  automation  failure)  that  low  and 
high  DBH  enzyme  groups  experienced.  They  found  no  difference  between  the  low  and  high 
DBH  enzyme  groups  with  manual  and  reliable  automation,  but  with  automation  failures 
individuals  in  the  low  DBH  enzyme  group  performed  better  compared  to  individuals  in  the  high 
DBH  enzyme  group.  Parasuraman  et  al.  (2012)  attributed  this  effect  to  individual  differences  in 
working  memory  induced  by  enhanced  DA  availability  in  the  low  DBH  enzyme  group. 

However,  because  they  did  not  measure  working  memory  or  other  cognitive  abilities,  it  is  still 
unclear  if  individual  differences  in  working  memory  interact  with  automation  reliability  to  affect 
performance. 

The  importance  of  individual  differences  in  working  memory  was  examined  in  another  study  (de 
Visser,  Shaw,  Mohamed-Ameen,  &  Parasuraman,  2010).  Researchers  investigated  the  role  of 
working  memory  in  an  automated  UAV  task  by  varying  task  load  (low,  high)  and  automation 
reliability  (manual,  reliable,  and  automation  failure).  Participants  completed  both  the  Operation 
Span  (OPSAN)  and  Spatial  Span  (SSPAN)  working  memory  tests  (Engle,  2002).  Researchers 
found  a  significant  correlation  with  OSPAN  scores  and  performance  on  the  automated  task.  Eor 
each  automation  task  performance  measure,  they  found  that  linear  models  that  included  working 
memory  accounted  for  more  of  the  variance  in  performance  as  compared  to  the  linear  models 
without  the  individual  differences  OSPAN  measure.  Thus,  when  individual  differences  in 
working  memory  are  accounted  for,  more  variation  in  performance  with  automation  can  be 
explained.  Critically,  however,  this  study  did  not  vary  in  types  or  levels  of  automation. 

The  current  research  was  aimed  at  understanding  the  sources  of  performance  differences 
underlying  human-automation  interaction  with  imperfect  automation  across  different  types  and 
levels  of  automation  (for  a  review  see  Onnasch  et  al,  2014)  as  it  specifically  relates  to  individual 
differences.  Eirst,  we  varied  types  and  levels  of  imperfect  automation  and  task  load.  Second,  we 
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measured  individual  differences  in  working  memory  ability  by  using  a  performance -based 
working  memory  task  compared  to  self-reported  measures  of  abilities,  complex  proxy  tasks  (e.g., 
video  game  performance),  or  genetic  predictors  of  cognitive  performance.  Finally,  we 
systematically  varied  primary  task  demand:  evidence  from  a  review  of  20  automation  reliability 
studies  suggested  that  dependence  on  imperfect  automation  would  be  stronger  with  increased 
task  demand  (because  the  operator’s  limited  resources  are  expended;  Wickens  &  Dixon,  2007). 
We  hypothesized  that  individual  differences  in  working  memory  would  differentially  impact 
reliance  on  varying  types  and  levels  of  automation.  Specifically: 

1)  First,  consistent  with  previous  literature,  we  hypothesized  that: 

a)  operators  would  perform  better  with  reliable  automation  compared  to  manual  control. 

b)  there  would  be  no  difference  between  task  load  conditions  when  the  automation  was 
reliable. 

c)  the  differential  impact  of  information  versus  decision  automation  would  be  evident  with 
automation  failures,  especially  when  task  load  was  high. 

2)  Second,  as  suggested  by  Parasuraman  et  al.  (2012),  we  expected  individuals  with  higher 
working  memory  ability  to  show  less  of  a  decrement  when  formerly  supportive  automation 
failed  compared  to  individuals  with  lower  working  memory  ability.  Specifically,  with 
automation  failures,  high  task  load,  and  increasing  automation  support  it  was  predicted  that 
the  benefits  of  better  spatial  working  memory  ability  would  be  highlighted. 

3)  Third,  we  expected  a  relationship  between  variations  in  cognitive  ability  and  self-report 
measures  of  trust.  Specifically,  individuals  with  lower  working  memory  abilities  would  trust 
the  automation  more  compared  to  individuals  with  higher  spatial  working  memory  abilities 
because  individuals  with  lower  working  memory  abilities  would  need  to  rely  on  the 
automation  more  than  those  with  higher  working  memory  abilities. 

Methods,  Procedure,  and  Results 

[can  be  found  in  Rovira,  Pak,  &  McLaughlin,  (under  review),  attached  in  appendix] 

Conclusion 

The  extent  to  which  automation  enhances  decision-making  depends  on  individual  differences  in 
cognitive  ability.  Using  a  simulated  automated  targeting  task,  we  showed  that  the  extent  to 
which  an  operator  experienced  both  the  costs  of  automation  failures  and  the  benefits  of  reliable 
automation  depended  on  individual  differences  in  working  memory.  This  finding  may  help 
optimize  human-automation  interaction.  Further,  our  findings  that  working  memory  ability  is 
related  to  trust  in  automation  suggest  more  work  should  consider  this  individual  difference. 

Our  study  replicated  prior  research  that  operators  would  perform  better  with  reliable  automation 
compared  to  manual  control  (Hypothesis  la).  In  addition,  task  load  did  not  differentiate 
performance  when  the  automation  was  reliable  (Hypothesis  lb).  Finally,  our  study  showed  that 
with  automation  failures,  there  was  no  difference  in  accuracy  with  information  automation  and 
low-decision  automation  between  low  and  high  task  load  but  accuracy  declined  at  high  task  load 
with  medium  automation  (Hypothesis  Ic).  These  results  demonstrate  an  interesting  difference 
between  lower  automation  (information  and  low-decision)  and  higher  automation  (medium- 
decision).  It  appears  that  lower  automation  can  mitigate  some  of  the  performance  penalty  of 
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increased  task  load  when  automation  fails  while  performance  significantly  declines  with 
automation  failures  and  higher  types  and  levels  of  automation.  The  drop  in  decision  accuracy 
with  increased  task  load  occurs  because  the  further  along  the  information-processing  continuum 
that  automation  supports  the  operator  (e.g.,  cognitive  versus  perceptual),  the  more  detrimental 
automation  failures  are  because  operators  will  not  have  generated  their  own  courses  of  action 
(Wickens  &  Xu,  2002). 

A  critical  hypothesis  regarded  the  role  of  individual  differences  and  automation  performance 
(Hypothesis  2).  The  MLM  showed  cross-level  interaction  between  working  memory,  trial 
reliability,  and  automation  support.  Performance  was  generally  positively  affected  by  increasing 
automation  but  especially  for  those  with  lower  working  memory.  Indeed,  with  reliable 
automation  support  above  information  automation,  working  memory  did  not  differentiate 
accuracy.  Low  and  medium-decision  automation  may  have  reduced  the  working  memory 
demands  of  the  task.  Thus,  reliable  and  increased  automation  support  was  especially  beneficial 
for  those  with  lower  working  memory  (with  maximal  differences  by  working  memory  for 
information  automation). 

When  automation  failed,  all  participants’  accuracies  declined  as  the  type  and  level  of  automation 
increased.  However,  those  with  lower  working  memory  were  more  severely  impacted  by 
automation  failures  than  those  with  higher  working  memory.  Taken  together,  these  results 
confirmed  hypothesis  2  regarding  the  effects  of  type  and  level  of  automation  and  working 
memory.  These  results  also  added  detail  to  the  conventional  wisdom  that  increasing  automation 
type  or  level  benefits  performance  but  can  lead  to  catastrophic  performance  when  automation 
fails  (i.e.,  the  lumberjack  effect;  Onnasch  et  ak,  2014).  When  automation  support  was  low  but 
reliable,  those  with  higher  working  memory  outperformed  those  with  lower  working  memory, 
and  when  automation  failed,  those  with  lower  working  memory  suffered  more  than  those  with 
higher  working  memory.  Our  results  are  the  first  empirical  confirmation  of  the  link  between 
automation  performance  and  individual  differences  in  working  memory  as  suggested  by  previous 
researchers  (de  Visser  et  ak,  2010,  Parasuraman,  2012),  but  also  extends  the  literature  by  further 
specifying  the  automation  conditions  (type  and  level  of  automation  support  and  trial  reliability) 
under  which  working  memory  affects  performance. 

Finally,  hypothesis  3  which  predicted  a  relationship  between  working  memory  and  trust  in 
automation  was  supported.  We  found  that  working  memory  was  weakly  but  significantly 
negatively  correlated  to  measures  of  trust.  Specifically,  individuals  with  higher  working  memory 
ability  had  lower  trust,  reliance,  and  lower  beliefs  that  automation  would  improve  their 
performance. 


11 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Aim  2:  Determine  if  emotional  expression  can  assist  in  optimal  human-automation 
calibration 

Study  1:  Faces  as  ambient  displays 

Executive  summary 

Ambient  displays  are  used  to  provide  information  to  users  in  a  non-distracting  manner.  The 
purpose  of  this  research  was  to  examine  the  efficacy  of  facial  expressions  as  a  method  of 
conveying  information  to  users  in  an  unobtrusive  way.  Specifically,  the  current  study  assessed 
the  attention-demanding  characteristics  of  facial  expressions  using  the  dual-task  experiment 
paradigm.  Results  from  the  experiment  suggest  that  Chernoff  facial  expressions  are  decoded  with 
the  most  accuracy  when  happy  facial  expressions  are  used.  There  was  also  an  age-effect  on 
decoding  accuracy;  indicating  younger  adults  had  higher  facial  expression  decoding  performance 
compared  to  older  adults.  The  observed  decoding  advantages  for  happy  facial  expressions  and 
younger  adults  in  the  single-task  were  maintained  in  the  dual-task.  The  dual-task  paradigm 
revealed  that  the  decoding  of  Chernoff  facial  expressions  required  more  attention  (i.e.,  longer 
response  times  and  more  face  misses)  than  hypothesized,  and  did  not  evoke  attention-free 
decoding.  Chernoff  facial  expressions  do  not  appear  to  be  good  ambient  displays  due  to  their 
attentional  demanding  nature. 

Introduction 

Ambient  displays  can  take  many  forms.  For  example,  the  battery  meter  icon  of  a  computer 
interface,  or  a  dangling  string  from  the  ceiling  to  represent  network  traffic  on  a  computer 
network  (Weiser  &  Brown,  1995).  These  examples  are  considered  “ambient”  because  they 
convey  information  to  the  user  without  being  substantially  taxing  on  cognitive  faculties  (i.e.,  they 
are  in  the  background  and  do  not  require  the  user  to  change  focus  or  switch  attention).  Several 
important  characteristics  have  been  identified  for  the  design  of  a  good  ambient  display. 

Examples  of  these  characteristics  include:  providing  useful  and  relevant  information,  having  a 
sufficient  information  design,  using  consistent  and  intuitive  mapping,  and  appropriate  matching 
between  the  system  and  the  real  world  (Mankoff  et  ak,  2003).  If  these  characteristics  are 
adequately  fulfilled  by  facial  expressions,  then  facial  expressions  could  be  considered  a  good 
form  of  ambient  display.  The  purpose  of  this  study  is  to  determine  if  face  stimuli  can  serve  as 
ambient  indicators  of  quantitative  information. 

One  situation  where  ambient  displays  may  be  helpful  is  in  human-automation  interaction  (HAI). 
In  some  HAIs,  users  may  become  unaware  of  the  hidden  decision  making  processes  or  outcomes 
of  automation.  They  may  also  lose  track  of  the  automation’s  reliability  over  time  (i.e.,  forget  how 
reliable  or  unreliable  it  has  been  in  the  past).  Such  information  (uncertainty  of  current  processes, 
past  reliability)  can  lead  to  fluctuations  in  trust  that  may  not  be  justified  (un-calibrated  trust);  that 
is  trust  that  may  be  unwarranted.  Un-calibrated  trust  can  manifest  itself  as  continued  use  of 
unreliable  automation  (misuse)  or  unwarranted  discontinued  use  of  reliable  automation  (disuse) 
both  of  which  cause  non-optimal  HAIs  (Parasuraman,  1997). 
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One  way  in  which  an  automated  system  can  encourage  proper  calibration  is  by  presenting  as 
much  information  about  its  operation  as  possible.  For  example,  it  could  present  its  own 
confidence  in  its  recommendation,  so  called  “system  confidence”,  or  it  could  present  a  historical 
picture  of  its  own  reliability  (both  are  information  that  are  easily  accessible  by  a  system).  This 
concept  can  be  categorized  in  the  ambient  display  heuristic  of  useful  and  relevant  information. 
For  example,  if  the  system  is  working  from  faulty  data,  it  will  weight  its  advice  as  potentially 
unreliable.  Presenting  critical  information,  such  as  system  confidence,  is  a  way  of  diminishing 
the  uncertainty  that  can  exist  in  HAIs  (Bubb-Lewis  &  Scerbo,  1997).  Trust  is  a  malleable 
variable  that  can  be  shaped  through  interactions  with  a  system  (Antifakos,  Kern,  Schiele,  & 
Schwaninger,  2005). 

If  a  system  is  presenting  the  operator  with  its  system  confidence  level,  then  the  operator  will  be 
able  to  build  a  more  appropriate  trust  relationship  with  the  automation.  However,  this 
presentation  needs  to  be  salient  and  the  automation  state  indicator  should  not  add  attentional 
demands  to  the  user  (Parasuraman,  1997).  Some  previous  research  has  indicated  that  methods 
such  as  tactile  output  and  auditory  output  may  be  helpful  in  conveying  system  confidence 
(Wisneski,  1999;  Poupyrev,  Maruyama,  &  Rekimoto,  2002;  Sawhney  &  Schmandt,  2000).  While 
these  modalities  are  novel  in  certain  capacities,  a  less  intrusive  and  less  attention  demanding 
modality  would  be  more  beneficial  to  users.  Thus,  the  ideal  stimulus  display  type  would  be  one 
that  provides  the  user  with  meaningful  information,  while  not  becoming  a  distraction  or  a  drain 
on  the  user’s  attention  (Antifakos,  Kern,  Schiele,  and  Schwaninger,  2005).  Coding  information 
as  emotional  expression  in  human-like  faces  may  fulfill  this  role. 

Neuroimaging  studies  have  supported  the  notion  that  the  emotional  processing  of  faces  is  a  more 
effective  pathway  than  the  processing  of  other  stimuli.  A  previous  study  compared  the  automatic 
processing  of  emotional  facial  expressions  versus  emotional  words.  Rellecke  (201 1) 
hypothesized  that  facial  expressions  would  be  encoded  more  automatically  than  words,  due  to 
their  perceptual  features  and  humans’  natural  ability  to  encode  them.  This  study  was  novel 
because  it  took  two  theoretically  attention-free  emotional  processing  stimuli  (i.e.,  faces  and 
words),  and  compared  their  efficiency  and  effect.  The  degree  of  encoding  automaticity  was  being 
tested  for  each  of  these  stimuli.  Based  on  the  results  of  the  electroencephalogram  (EEG),  the 
event-related  brain  potentials  (ERPs)  recorded  for  the  facial  expression  conditions  were  found  to 
have  a  prolonged  effect  on  the  brain. 

This  finding  alludes  to  emotional  facial  expression  processing  as  being  automated  to  a  higher 
extent  than  emotional  word  processing.  Rellecke  (201 1)  discusses  the  potential  necessity  for 
preconditions  for  the  high  automatic  processing  of  emotional  words.  This  was  apparent  because 
the  two  stimuli  were  tested  in  the  same  superficial  stimulus  analysis  task,  but  only  one  (i.e., 
facial  expression)  led  to  advanced  pre-attentive  processing.  Eacial  expression  seems  to  be  a 
stimulus  that  needs  no  prompting  or  preconditions  to  allow  fast,  but  also  meaningful  processing 
(Rellecke,  2011).  Data  analysis  found  that  happy  faces  were  decoded  earlier  than  other  faces 
(i.e.,  50-100  ms). 

This  supports  the  theory  that  happy  faces  are  advantageous  in  the  early  stages  of  emotional 
processing  and  may  be  instrumental  in  attention-free  encoding.  Also,  data  showed  that  angry 
faces  were  advantageous  for  later  decoding  (i.e.,  150-450  ms).  This  coincides  with  previous 
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research  that  states  angry  expressions,  or  threat-related  expressions,  have  prolonged  effects  on 
the  brain  (Rellecke,  2011).  These  differences  in  emotion  type  on  ERPs  show  that  there  may  be  a 
specific  type  of  emotion  that  elicits  faster  decoding  for  humans. 

Chemojf  Faces 

Chernoff  faces  were  created  as  a  way  to  represent  multivariate  data  in  a  way  that  would  allow  the 
viewer  to  gain  information  in  a  quick,  yet  complete  manner.  For  example,  some  of  the  original 
Chernoff  faces  were  used  to  represent  fossil  data.  The  Chernoff  faces  displayed  information 
pertinent  to  the  fossils  (i.e.,  inner  diameter  of  embryonic  chamber,  total  number  of  whorls, 
maximum  height  of  chambers  in  last  whorl,  etc.)  through  variations  including,  but  not  limited  to 
the  faces:  head  shape,  eye  size,  mouth  size/shape,  and  eyebrow  size/slant.  Chernoff’s  rationale 
was  that  due  to  the  extreme  familiarity  of  faces,  people  would  easily  detect  differences  in  the 
configuration  of  a  face,  even  if  the  differences  were  small  ones  (Chernoff,  1973).  It  was  expected 
that  people  would  at  least  be  able  to  examine  faces  more  quickly  than  examining  a  row  of 
numbers.  Assuming  that  this  is  true,  a  schematic  facial  expression  should  act  as  a  superb  source 
of  information  output. 

Chernoff  faces  have  up  to  18  characteristics  that  can  be  manipulated  (Nelson,  2007).  When 
representing  multivariate  data  (e.g.,  the  fossil  data)  it  is  beneficial  to  have  multiple  facial 
elements  that  can  be  manipulated  and  used  for  representing  various  data.  However,  when 
representing  univariate  data  (i.e.,  a  single  percentage  score)  it  seems  that  having  a  lower  number 
of  manipulated  facial  features  is  more  beneficial.  Therefore,  it  could  be  problematic  to  have 
several  individual  facial  elements  for  the  human  to  properly  decode.  As  Montello  and  Gray 
(2005)  state,  it  is  more  beneficial  to  have  a  stimulus  that  communicates  information  univariately 
rather  than  multivariately  when  the  goal  is  to  give  the  user  a  single  quantity.  A  pseudo-Chernoff 
face  may  be  a  remedy  for  this  dilemma  (Montello  &  Gray,  2005).  This  “pseudo-Chernoff’  face 
could  be  created  by  systematically  manipulating  one  facial  characteristic,  while  holding  all 
others  constant.  To  properly  convey  a  simple  quantitative  score  the  Chernoff  face  may  only  need 
to  have  one  facial  characteristic  manipulated.  Through  this  manipulation,  the  human  may  be 
more  apt  to  decode  the  Chernoff  face  accurately  and  quickly,  while  noticing  subtle  changes 
(Kabulov,  1992). 

The  issue  of  whether  interpreting  Chernoff  faces  is  a  relatively  less  attention-demanding  task  is 
of  primary  importance  to  the  current  study.  Previous  studies  have  investigated  the  effectiveness 
of  Chernoff  faces  as  a  pre-attentive  stimulus  with  mixed  results.  A  study  concluded  that  Chernoff 
faces  are  not  processed  pre-attentively,  and  do  not  benefit  users  more  than  other  modes  of  visual 
information  display  (Morris,  Ebert,  &  Rheingans,  2000).  The  process  of  identifying  the 
characteristics  (eyebrow  slant,  eye  size,  nose  length)  of  the  Chernoff  face  was  said  to  be  a  serial 
process.  Participants’  accuracy  of  target  stimuli  identification  improved  when  they  were  given 
more  time  and  less  distracters,  indicating  that  the  task  was  not  pre-attentive  (Morris,  Ebert,  & 
Rheingans,  2000).  A  similar  study  investigated  data  visualization  and  used  Chernoff  faces  as  one 
of  the  “glyph  stimuli”  to  discover  which  data  visualizations  were  the  most  effective  (Fee,  Reilly, 
&  Butavicius,  2003).  Glyphs  are  data  visualizations  that  are  characterized  by  their  attempt  to 
display  multivariate  data  through  the  manipulation  of  features  on  the  glyph  that  correspond  to 
raw  data.  It  was  found  that  participants  had  lower  accuracy  scores  and  took  longer  to  answer 
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questions  when  exposed  to  the  glyph  stimuli  (Lee,  Reilly,  &  Butavicius,  2003).  This  indicates  a 
serial  processing  of  information  from  the  Chernoff  faces,  which  is  in  agreement  with  the  findings 
of  Morris,  Ebert,  &  Rheingans  (2000). 

Age-Related  and  Cultural  Effects  on  Decoding 

Despite  the  ease  with  which  humans  are  able  to  decode  emotional  facial  expressions,  it  is  still 
moderated  by  age.  Age  can  alter  a  person’s  ability  to  correctly  perceive  and  understand  the  facial 
expression  that  is  presented  to  them.  Neuropsychological  research  has  shown  that  age-related 
issues  in  facial  expression  decoding  may  be  a  result  of  problems  with  the  medial  temporal  lobe 
(Orgeta  &  Phillips,  2007).  The  amygdala  is  housed  here,  which  corroborates  with  previous 
research  that  suggests  the  amygdala  is  necessary  for  facial  expression  decoding  (Whalen,  1998; 
Morris,  1998).  Despite  these  age-related  issues;  a  competing  theory  has  been  asserted  regarding 
older  adults’  ability  to  decode  emotional  facial  expressions.  The  socioemotional  selectivity 
theory  asserts  that  social  behavior  is  essentially  a  byproduct  of  time  (Carstensen,  Issacowitz,  & 
Charles,  1999).  In  a  sense,  time  can  be  thought  of  as  the  chronological  age  of  a  human.  As  the 
human  ages,  they  essentially  have  less  time  to  live  and  fulfill  goals.  This  affects  the  way  they 
view  their  decisions  and  weight  their  goals.  The  two  types  of  goals  that  make  up  the 
socioemotional  selectivity  theory  are  knowledge-based  and  emotion-based  goals  (Carstensen, 
Issacowitz,  &  Charles,  1999).  Younger  adults  are  more  likely  to  pursue  knowledge -based  goals 
because  they  have  more  time  potential.  The  trade  off  for  knowledge  in  lieu  of  emotional  goals 
appears  to  be  a  worthy  endeavor.  Older  adults  supposedly  take  the  opposite  approach  and  view 
emotional-based  goals  as  top  priority.  Older  adults’  view  time  as  a  non-renewable  resource,  and 
seek  to  spend  anytime  they  have  left  enjoying  positive  emotional  experiences  (Carstensen, 
Issacowitz,  &  Charles,  1999). 

According  to  the  socioemotional  selectivity  theory,  older  adults  may  actually  be  more  aware  of 
certain  emotional  situations  and  images  than  non-emotional  (Orgeta  &  Phillips,  2007). 

Orgeta  and  Phillips  (2007)  showed  older  adults  as  being  more  accurate  at  identifying  positive 
facial  expressions,  opposed  to  negative  facial  expressions.  Older  adults  were  found  to  identify 
positive  emotions  as  accurately  as  younger  adults.  There  was  no  significant  difference  between 
the  older  adults  and  younger  adults  in  terms  of  identifying  positive  facial  emotions  (i.e., 
happiness  and  surprise).  However,  older  adults  were  significantly  worse  than  younger  adults  at 
identifying  negative  facial  emotions  (i.e.,  sadness,  anger,  and  fear).  The  results  of  this  study 
indicated  that  there  is  an  age-related  difference  for  the  decoding  of  negative  facial  expressions, 
but  not  positive  facial  expressions  (Orgeta  &  Phillips,  2007).  The  ease  of  recognition  for  certain 
emotional  expressions  versus  others  is  an  area  that  is  pertinent  to  this  research  area.  As  Orgeta 
and  Phillips  (2007)  showed,  older  adults  may  have  a  positivity  bias  that  allows  them  to  overcome 
any  cognitive  decrements  that  interrupt  other  emotional  decoding,  thus  decoding  positive  facial 
expressions  as  accurately  as  younger  adults.  Other  research  has  supporting  data  showing  that 
positive  expressions  (e.g.,  happiness)  are  processed  more  quickly,  supported  by  faster  N170 
latencies  (Batty  &  Taylor,  2003).  Perhaps  this  quick  processing  attributes  to  the  robustness  of  the 
happy  facial  expression  compared  to  other  expressions. 

A  previous  study  manipulated  the  factors  of  chronological  age  and  the  participant’s  working  self- 
concept  to  determine  if  the  positivity  effect  could  in  fact  be  evoked  in  younger  adults,  and 
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likewise  the  negativity  effect  in  older  adults  (Lynchard  &  Radvansky,  2012).  During  the 
experiment  the  participant  would  complete  a  possible  selves  orienting  task.  The  older  adults 
completed  the  younger  possible  selves  orienting  task,  while  the  younger  adults  completed  the 
older  possible  selves  orienting  task.  Essentially,  this  made  the  participant’s  working  self-concept 
the  opposite  of  their  chronological  age.  The  results  showed  that  there  was  a  reversal  of 
stereotypical  age-related  emotional  information  processing.  Younger  adults  displayed  a  positivity 
effect,  which  is  thought  to  be  a  unique  attribute  of  older  adults.  Similarly,  older  adults  displayed 
a  negativity  effect,  which  is  thought  to  be  unique  to  younger  adults  (Lynchard  &  Radvansky, 
2012).  This  study  showed  that  more  than  just  chronological  age  plays  a  role  in  the 
socioemotional  selectivity  theory.  Humans  are  subject  to  emotional  information  processing 
biases  based  on  less  concrete  variables  such  as  their  working  self-concept. 

Decoding  facial  expressions  is  a  cross-cultural  behavior  that  is  a  critical  part  of  human  life.  There 
are  six  basic  emotions  that  transcend  culture.  These  are:  anger,  happiness,  fear,  surprise,  disgust, 
and  sadness  (Ekman  &  Eriesen,  1975).  These  emotions  can  be  represented  with  facial 
expressions  (Lee,  2006;  Batty,  2003).  Because  these  facial  expressions  are  not  confined  to 
specific  cultures,  it  puts  no  restraints  on  the  ability  of  different  people  groups  to  successfully 
decode  these  facial  expressions.  It  appears  that  increasing  age  is  a  factor  that  may  cause 
differences  in  aspects  of  facial  expression  decoding,  while  cultural  background  seems  to  be  of  no 
hindrance.  The  unique  quality  that  facial  expressions  have  in  their  prevalence  and  familiarity  in 
human  culture  makes  them  a  good  candidate  for  an  ambient  display.  This  quality  of  facial 
expressions  allows  the  heuristic  of  matching  the  system  to  the  real  world  to  be  met. 

Limitations  of  Previous  Literature 

The  previous  literature  has  provided  a  foundation  for  knowledge  about  facial  expressions,  but 
there  are  limitations  to  these  studies.  The  Hess  (1997)  study  presented  emotional  facial 
expressions  in  a  single-task  format.  The  participants  viewed  the  image  and  rated  it  on  the 
emotionality  and  intensity  that  they  perceived.  This  methodology  does  not  clarify  whether  facial 
emotion  decoding  is  truly  resource/attention-free  as  neuropsychological  studies  suggest.  A  dual¬ 
task  experiment  should  be  implemented  to  properly  measure  attention  usage.  In  order  to  gain  this 
data;  measures  of  response  time,  accuracy,  and  subjective  workload  should  be  used.  The  Hess 
(1997)  study  also  measured  decoding  accuracy  for  each  facial  expression  image  through  the 
presentation  of  several  emotion  scales  at  once.  The  participant  was  presented  with  seven 
emotional  labels,  which  they  manipulated  to  show  the  intensity  of  emotion  for  the  previous 
picture.  Instead  of  presenting  seven  individual  scales,  it  seems  to  be  less  complicated  to  present 
one  scale  or  to  have  a  quick  input  device  (e.g.,  keyboard  number  keys)  after  the  image  is  viewed. 

The  Hess  (1997)  study  presented  facial  expression  intensity  in  increments  of  20  %  intensity.  This 
intensity  scale  may  not  provide  enough  precision  or  a  complete  spectrum  of  facial  expression 
decoding  data.  The  Orgeta  and  Phillips  (2007)  study  also  presented  only  four  intensity  levels. 

The  number  of  intensity  levels  may  need  to  be  increased  (i.e.,  create  smaller  increments  of 
percentage  changes  between  each  stimuli)  to  capture  a  more  accurate  representation  of 
participants’  ability  to  decode  facial  expression.  Another  limitation  in  the  Orgeta  and  Phillips 
(2007)  study  was  the  facial  images  were  presented  in  increasing  order  as  the  participant 
advanced  through  the  experiment.  This  method  may  have  led  to  participants  forming  an 
anticipation  bias  that  the  next  facial  image  was  going  to  be  more  expressive. 
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Previous  research  has  also  provided  evidence  that  age-related  effects  may  cause  differences  in 
the  ability  for  humans  to  properly  decode  facial  expressions.  It  has  been  shown  that  older  adults 
are  worse  at  identifying  negative  facial  expressions  (i.e.,  sadness,  anger,  and  fear).  Older  adults 
struggled  significantly  versus  younger  adults  in  properly  recognizing  the  negative  emotions  at 
intensity  levels  of  50  %,  75  %,  and  100  %.  It  appears  that  older  adults  have  a  higher  recognition 
threshold  for  certain  negative  emotions  than  younger  adults.  Basically,  older  adults  do  not  pick 
up  on  negative  facial  stimuli  as  easily  as  younger  adults  and  need  more  intense  facial  expressions 
to  determine  the  appropriate  emotional  state  (Orgeta  &  Phillips,  2007).  In  order  to  determine  if 
theories  such  as  the  socioemotional  selectivity  theory  pertain  to  Chernoff  face  recognition,  there 
needs  to  be  an  independent  variable  of  age  with  levels  of  younger  and  older  adults. 

The  variable  of  gender  of  the  facial  expression  stimuli  could  be  considered  a  confounding 
variable.  Hess  (1997)  used  two  male  and  two  female  actors  to  create  facial  expressions  for  their 
study.  Results  of  this  study  showed  that  the  gender  of  the  stimuli  (i.e.,  actors)  did  influence 
participant  rating  accuracy.  For  the  expressions  of  happy  and  sad,  there  was  an  interaction  of  the 
gender  of  the  stimuli  x  intensity  of  the  expression  (Hess,  1997).  Because  of  this  reported 
interaction,  it  would  be  beneficial  to  use  non-gender  specific  stimuli  to  eliminate  this 
confounding  variable. 

Previous  studies  have  looked  at  users’  ability  to  properly  decode  facial  expression  type  (Ekman 
&  Friesen,  1975),  intensity  (Tsurusawa,  Goto,  Mitsudome,  Nakashima,  &  Tobimatsu,  2007;  Hess 
1997),  and  the  effectiveness  of  Chernoff  faces  (Chernoff  1973;  Tsurusawa,  Goto,  Mitsudome, 
Nakashima,  &  Tobimatsu,  2007;  Morris,  Ebert,  &  Rheingans,  2000).  The  purpose  of  the  current 
study  is  to  examine  the  users’  ability  to  accurately  decode  a  quantitative  value  from  Chernoff 
facial  expressions. 

Overview  of  the  Study 

In  order  to  determine  the  attention  usage  by  the  participants,  a  dual-task  methodology  was  used. 
Our  study  used  the  dual-task  paradigm  to  measure  the  attention-demanding  characteristics  of 
facial  displays.  The  Hess  (1997)  study  measured  participant’s  decoding  accuracy  with  several 
scales  after  each  trial.  This  method  may  create  confusion  for  the  participant,  and  not  accurately 
record  participant  decoding  time.  The  interface  should  allow  for  quick  and  simple  input  of  the 
facial  expression  intensity  from  the  participant.  The  current  study  used  only  one  measurement 
scale  (direct  key  entry)  after  each  trial  to  eliminate  any  confusion  for  the  participants  about  what 
the  scales  are  measuring  and  give  a  better  approximation  about  how  quickly  the  participant  can 
decode  the  facial  expression.  In  the  Orgeta  and  Phillips  (2007)  study  the  facial  expressions  were 
shown  in  increasing  order.  This  technique  was  not  replicated  in  the  current  study.  Instead,  a 
randomized  sequence  of  facial  expression  stimuli  was  used  to  control  for  any  biases  that  could  be 
formed  due  to  participant  expectations.  The  Chernoff  face  stimuli  were  manipulated  differently 
compared  to  previous  research  (Chernoff,  1973;  Tsurusawa,  Goto,  Mitsudome,  Nakashima,  & 
Tobimatsu,  2007;  Morris,  Ebert,  &  Rheingans,  2000).  Only  the  mouth  was  manipulated  in  order 
to  gain  understanding  about  the  affect  of  this  one  variable  on  decoding.  Einally,  the  current  study 
used  a  more  precise  facial  expression  intensity  scale  than  previous  research  (Hess,  1997;  Orgeta 
&  Phillips,  2007).  To  accomplish  this,  a  facial  expression  scale  presenting  emotions  in 
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increments  of  10  %  was  used.  Our  assumption  was  that  by  making  these  modifications  the 
current  study  would  be  able  to  address  the  research  question  with  more  accuracy. 

Methods,  Procedure,  and  Results 

[can  be  found  in  Bass,  2014,  attached  in  appendix] 

Conclusion 

The  goal  of  the  study  was  to  investigate  whether  Chernoff  face  stimuli  could  serve  as  ambient 
(i.e.,  relatively  resource-free)  indicators  of  quantitative  information,  using  a  dual-task  paradigm. 
In  general,  we  hypothesized  that  sad  face  emotion  decoding  would  show  age-related  differences 
but  happy  faces  would  be  immune  to  age-related  differences.  This  was  based  on  the  literature 
indicating  positive  facial  expressions  provided  a  decoding  advantage  (i.e.,  are  more  easily 
decoded;  Bartneck  &  Reichenbach,  2005;  Calvo  &  Lundqvist,  2008;  Rellecke,  2011),  and  the 
finding  that  older  adults  could  decode  positive  facial  expressions  as  accurately  as  younger  adults 
(Orgeta  &  Phillips,  2007).  However,  we  found  that  the  relationship  between  younger  and  older 
adults’  decoding  accuracy  did  not  significantly  change  due  to  facial  expression  condition. 
Therefore,  there  was  an  age-related  difference  in  decoding  accuracy  in  the  happy  face  condition. 

However,  when  collapsing  across  age  group,  participants  had  higher  decoding  accuracy  when 
they  were  presented  with  happy  facial  expressions.  This  finding  supports  a  general  “happy  face 
advantage”  and  suggests  that  when  compared  to  sad  Chernoff  facial  expressions,  happy  Chernoff 
facial  expressions  are  more  advantageous  for  decoding.  In  terms  of  using  a  Chernoff  face  for  the 
display  of  quantitative  information;  the  use  of  happy  facial  expression  was  shown  to  be  an 
overall  more  decodable  stimuli.  This  finding  corroborates  previous  research  that  show  evidence 
of  more  accurate  happy  face  decoding  (Hess,  1997). 
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Aim  3:  Investigate  how  anthropomorphically  designed  automation  affects  automation 
error  attributions 

Study  1:  Investigating  Older  Adults’  Trust,  Causal  Attributions,  and  Perception  of  Capabilities 
in  Robots  as  a  Function  of  Robot  Appearance,  Task,  and  Reliability 

Executive  Summary 

The  purpose  of  the  current  study  is  to  examine  the  extent  to  which  the  appearance,  task,  and 
reliability  of  a  robot  is  susceptible  to  stereotypic  thinking.  Stereotypes  can  influence  the  types  of 
causal  attributions  that  people  make  about  the  performance  of  others.  Just  as  causal  attributions 
may  affect  an  individual’s  perception  of  other  people,  it  may  similarly  affect  perceptions  of 
technology.  Stereotypes  can  also  influence  perceived  capabilities  of  others.  That  is,  in  situations 
where  stereotypes  are  activated,  an  individual’s  perceived  capabilities  are  typically  diminished. 
The  tendency  to  adjust  perceptions  of  capabilities  of  others  may  translate  into  levels  of  trust 
placed  in  the  individual’s  abilities.  A  cross-sectional  factorial  survey  using  video  vignettes  will 
be  utilized  to  assess  young  adults’  and  older  adults’  attitudes  toward  a  robot’s  behavior  and 
appearance.  We  hypothesize  that  a  robot’s  older  appearance  will  result  in  lower  levels  of  trust, 
more  dispositional  attributions,  and  lower  perceptions  of  capabilities  while  high  reliability  should 
positively  impact  trust. 

Introduction 

When  interacting  with  technology,  people  focus  on  human-like  qualities  of  the  technology  more 
than  the  asocial  nature  of  the  interaction  (Reeves  &  Nass,  1996;  Nass  &  Moon,  2000)  attributing 
human-like  qualities  such  as  personality,  mindfulness,  and  social  characteristics.  The  attribution 
of  human-like  qualities  makes  technology  susceptible  to  stereotyping  based  on  appearance  and 
etiquette  (e.g.,  Nass  &  Lee,  2001;  Parasuraman  &  Miller,  2004;  Eyssel  &  Kuchenbrandt,  2012). 
For  example,  when  a  male  or  female  anthropomorphic  computerized  aid  was  included  in  a  trivia 
task,  participants  were  more  likely  to  trust  the  male  aid’s  suggestions  and  ranked  the  female  aid 
as  less  competent  (Lee,  2008). 

The  purpose  of  the  current  study  is  to  examine  the  extent  to  which  the  appearance,  task,  and 
reliability  of  a  robot  is  be  susceptible  to  stereotypic  thinking.  The  theoretical  relevance  is  that  the 
results  of  this  study  will  inform  the  limits  of  stereotypic  thinking  by  investigating  whether 
stereotypes  are  applied  to  robots.  The  practical  relevance  is  that  the  current  study  may  inform  the 
design  of  robots  to  enhance  human-robot  interaction,  particularly  for  older  adults  who  tend  to  be 
less  accepting  of  technological  aids  than  other  age  groups  (Czaja  et  ah,  2006). 

Stereotypes  and  Aging 

In  order  to  make  efficient  social  judgments  about  others,  individuals  rely  on  the  use  of  heuristics. 
One  example  heuristic  involves  placing  an  individual  into  a  pre-determined  schema  (i.e.,  a 
stereotype).  Stereotypes  are  cognitive  shortcuts  that  result  in  impressions  of  others  (e.g., 

Ashmore  &  Del  Boca,  1981).  Therefore,  older  adults  may  be  more  likely  than  younger  adults  to 
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apply  stereotypes  when  they  do  not  have  other  sources  of  information  available  to  them  (i.e., 
under  situations  of  ambiguity). 

Stereotypes  are  more  likely  to  be  activated  in  domains  that  are  inconsistent  with  prescriptive 
societal  gender  or  age  roles  (e.g.,  Kuchenbrandt,  Haring,  Eichberg,  Eyssel,  &  Andre,  2014).  Eor 
example,  individuals  perceived  a  female-voiced  computer  to  be  more  informative  about  romantic 
relationships  than  the  male-voiced  computer  (Nass,  Moon,  &  Green,  1997).  Although  gender 
stereotypes  have  been  studied  using  anthropomorphic  technological  aid  paradigms,  aging 
stereotypes  have  been  investigated  to  a  lesser  degree  within  this  context.  Pak,  McEaughlin,  & 
Bass  (2014)  examined  whether  the  physical  appearance  of  an  anthropomorphic  aid  would 
activate  stereotypic  thinking  and  affect  individuals’  trust  in  the  aid.  Using  a  factorial  design,  Pak 
et  al.  manipulated  the  technological  aid’s  gender  and  age  (younger,  older)  as  well  as  participants’ 
perceptions  of  the  reliability  of  the  automation.  Participants  were  told  that  the  automation  was 
either  45%,  70%,  or  95%  reliable.  However,  the  automation  always  provided  a  correct  answer 
during  testing.  The  task  in  this  study  was  a  health  behaviors  test  regarding  participants’ 
knowledge  about  diabetes.  Before  beginning  the  task,  participants  were  told  that  the  automated 
aid  was  a  Smartphone  application  recommended  by  a  doctor  designed  to  help  people  make  the 
best  decisions  about  diabetes.  As  the  participants  answered  each  question,  the  decision  aid  smart 
phone  app  would  appear  on  the  screen  and  the  agent  would  recommend  a  correct  answer.  All  of 
the  agents  were  dressed  as  doctors.  Participants  rated  their  subjective  trust  in  the  automation  and 
whether  they  would  actually  use  the  advice  of  the  application  on  a  1-7  Eikert  scale. 

Pak,  McEaughlin,  &  Bass  (2014)  found  that  both  younger  and  older  adult  participants  trusted  the 
older  anthropomorphic  aids  more  than  the  younger  aids,  the  male  aids  more  than  the  female  aids, 
and  more  reliable  applications  than  less  reliable  applications.  However,  stereotypic  thinking  was 
activated  when  perceptions  of  reliability  were  low  or  ambiguous.  When  the  app  had  low 
reliability,  the  younger  female  aid  was  trusted  less  than  younger  male  agents.  Also,  under 
medium  reliability,  the  older  female  aid  was  trusted  less  than  the  older  male  aid.  These  results 
suggest  that  trust  in  automation  can  be  influenced  by  physical  appearance  (i.e.,  gender  and 
perceived  age)  of  the  technology.  These  results  also  further  support  the  notion  that  technology  is, 
like  humans,  also  susceptible  to  stereotyping. 

Physical  appearance  is  known  to  play  a  large  role  in  the  activation  of  aging  stereotypes.  The  link 
between  physical  characteristics  and  stereotypes  has  been  well  established  in  the  social  cognition 
literature  (Brewer  &  Eui,  1984;  Hummert,  1994;  Hummert,  Garstka,  &  Shaner,  1997).  Within 
this  context,  facial  features  are  considered  to  be  the  main  source  of  information  used  in  order  to 
activate  stereotypes.  Hummert  et  al.  (1997)  found  that  negative  age  stereotypes  were  associated 
with  the  perception  of  advanced  age  through  facial  photographs.  Overall,  these  findings  suggest 
that  physical  cues  are  major  indicators  within  the  context  of  social  judgments. 

Stereotypes  about  older  adults,  although  pervasively  negative,  can  be  multidimensional  in  the 
right  context.  People  hold  both  positive  and  negative  stereotypes  about  older  adults  (Hummert, 
1993).  When  adults  of  all  ages  completed  a  trait  card-sorting  task  where  they  were  asked  to 
generate  traits  they  associated  with  older  adults,  Hummert  and  colleagues  (1994)  found 
approximately  10  different  aging  stereotypes,  including  positive  ones  like  the  “golden  ager”  who 
leads  an  active  and  engaged  lifestyle.  Although  many  stereotypes  are  held  in  common  by  people 
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of  all  ages,  aging  stereotypes  tend  to  become  increasingly  differentiated  as  people  grow  older 
(Hummert,  1993;  Hummert  et  al.,  1994). 

Stereotypes  and  other  social  beliefs  can  influence  the  way  in  which  individuals  process 
information  in  order  to  form  social  judgments,  including  the  types  of  causal  attributions  that 
people  make  about  the  performance  of  others  (Fiske  &  Taylor,  1991).  When  trying  to  determine 
the  causality  of  an  event,  people  tend  to  use  two  types  of  information:  internal  or  dispositional 
qualities  of  the  individuals  involved  in  an  outcome  and  the  influences  of  the  situation  itself 
(Gilbert,  1993;  Krull,  1993;  Krull  &  Erikson,  1995).  Potential  biases  in  the  attribution  process 
can  occur  as  a  function  of  the  valence  of  the  situational  outcome,  the  degree  of  ambiguity  of  the 
situation  (or  of  the  information  given  about  causal  factors),  and  the  controllability  of  the  situation 
(Blanchard-Fields,  1994).  Blanchard-Fields  suggested  that,  in  general,  older  adults  are  most 
likely  to  make  dispositional  attributions  when  the  outcome  of  a  situation  was  negative  and  the 
actor’s  role  in  the  outcome  was  ambiguous.  When  personal  beliefs  about  another  individual  or 
situation  are  violated,  older  adults  are  also  more  likely  to  make  to  make  dispositional  attributions 
of  blame  rather  than  situational  (Blanchard-Fields,  1996;  Blanchard-Fields,  Hertzog,  &  Horhota, 
2012).  Just  as  causal  attributions,  or  the  extent  to  which  behavior  is  attributed  to  situational  or 
dispositional  causes,  may  affect  an  individual’s  perception  of  other  people,  it  may  also  similarly 
affect  perceptions  of  technology.  For  example,  blaming  technology  for  unreliable  performance  is 
likely  to  induce  less  trust  (Moray,  Hiskes,  Fee,  and  Muir,  1995;  Madhavan,  Wiegmann,  & 
Facson,  2006).  Attribution  of  fault  has  been  studied  in  the  automation  and  has  been  referred  to 
as  automation  bias  (Mosier  &  Sitka,  1996).  Automation  bias  has  been  defined  “as  a  heuristic 
replacement  for  vigilant  information  seeking  and  processing”  (Mosier  &  Sitka,  p.  202)  which 
results  in  increased  omission  errors  and  commission  errors. 

Expectations  of  performance  outcomes  are  influenced  by  stereotypes.  Adults  of  all  ages  expect 
memory  performance  to  decline  with  age  (Fineweaver  and  Hertzog,  1998).  Similarly,  older 
adults’  abilities  are  perceived  negatively  in  domains  involving  memory  (Kite  &  Johnson,  1988; 
Kite,  Stockdale,  Whitley  &  Johnson,  2005)  and  physical  well-being  (Davis  &  Friedrich,  2010). 

In  memory  taxing  situations,  older  adults  are  perceived  as  being  less  credible  and  less  accurate 
(Muller-Johnson,  Toglia,  Sweeney,  &  Ceci,  2007).  The  tendency  to  adjust  perceptions  of 
capabilities  of  others  based  on  appearance  may  translate  into  levels  of  trust  placed  in  the 
individual’s  abilities. 

Trust  in  Automation 

Trust  in  technological  agents  is  important  because  it  affects  an  individual’s  willingness  to  accept 
robot’s  input,  instructions,  or  suggestions  (Fussier,  Gallien,  &  Guiochet,  2007).  For  example, 
Muir  and  Moray  (1996)  found  a  strong  positive  relationship  between  adults’  level  of  trust  in  an 
automated  system  and  the  extent  to  which  they  allocated  control  to  the  automated  system. 
Interestingly,  Muir  (1987)  suggests  that  people’s  trust  in  technology  is  affected  by  factors  that 
are  also  the  basis  of  interpersonal  trust.  Trust  in  automation  is  thought  to  develop  overtime 
(Maes,  1994)  suggesting  that  trust  is  influenced  by  past  experiences  with  the  technology.  For 
example,  Merritt  and  Ilgen  (2008)  describe  dispositional  trust  as  the  trust  placed  in  a  person  or 
automation  during  a  first  encounter  before  any  interaction  has  been  made  while  history  based 
trust  reflects  the  prior  experience  a  person  has  with  another  person  or  automation. 
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Performance  based  factors  have  a  large  influence  in  perceived  trust  in  HRI  (Brule,  Dotsch, 
Bijlstra,  Wigboldus,  &  Haselager,  2014).  In  fact,  a  recent  meta-analysis  suggests  that  a  robot’s 
task  performance  was  the  most  important  factor  in  adults’  trust  in  robots  (Hancock  et  ak,  201 1). 
That  is,  if  the  robot  performs  reliably,  the  human  will  exhibit  greater  trust  towards  the  robot.  The 
same  meta-analysis  found  that  behavior,  proximity,  and  size  of  the  robot  also  affect  trust  to  a 
lesser  extent.  However,  human-automation  trust  literature  suggests  that  appearance  can  have 
reliable  effects  on  trust  (Pak  Fink,  Price,  Bass,  &  Sturre,  2012).  Indeed,  studies  in  the  social 
literature  have  found  that  people  often  judge  an  individual’s  levels  of  trustworthiness  based  on 
facial  appearance  (Oosterhof  &  Todorov,  2008)  and  that  trust  judgments  can  be  formed  after 
only  a  brief  exposure  (100  ms)  to  a  face  (Willis  &  Todorov,  2006).  It  is  also  important  for  the 
robot’s  appearance  to  be  compatible  with  its  function  at  face  value.  Goetz,  Kiesler,  &  Powers 
(2003)  found  that  people  are  more  likely  to  accept  a  robot  when  its  appearance  matches  its 
perceived  capabilities.  This  is  thought  to  be  the  case  because  when  there  is  a  high  level  of 
compatibility  between  appearance  and  functionality,  users  expectations  are  confirmed,  boosting 
confidence  in  the  robot’s  performance.  However,  when  appearance  and  capabilities  are 
incompatible,  user  expectations  are  violated,  which  can  result  in  lower  levels  of  trust  (Duffy, 
2003). 

Because  studies  of  human  robot  interaction  are  a  new  field,  there  are  many  gaps  in  the  literature 
especially  regarding  the  social  influences  on  HRI.  First,  although  there  is  evidence  to  suggest 
that  stereotypes  can  affect  performance  and  interactions  with  anthropomorphized  technological 
aids,  we  do  not  know  how  pre-existing  age  stereotypes  will  affect  HRI.  Next,  it  is  unclear  how 
trust  might  be  moderated  by  task  type  and  reliability.  Although  the  automation  literature  suggests 
that  reliability  can  influence  trust,  to  our  knowledge  the  relationship  between  robot  task  domain 
and  trust  has  not  yet  been  investigated.  Finally,  how  does  stereotyping  technology  affect 
perception  of  capabilities  and  the  causal  attributions  made  about  performance? 

The  Current  Study 

The  purpose  of  this  study  is  to  better  understand  the  factors  that  influence  older  adults'  trust  in 
robots.  Specifically,  we  are  investigating  whether  the  robots’  appearance,  task  domain,  and 
reliability  of  the  robot’s  performance  influence  trust  in  the  automation.  A  cross-sectional 
factorial  survey  study  will  be  utilized  using  video  vignettes  to  assess  participants’  attitudes 
towards  the  robots’  behavior  and  appearance.  Each  vignette  will  include  manipulations  of  the  age 
of  the  robot,  the  domain  of  the  collaborative  task,  and  the  reliability  of  the  robot’s  performance. 
Dependent  measures  will  include  the  level  of  trust  participants  exhibit  toward  the  robot,  causal 
attributions  regarding  the  robot’s  performance,  and  perceived  capabilities  of  the  robot. 

It  is  hypothesized  that  manipulating  a  robot’s  appearance,  level  of  reliability,  and  the  task  type 
will  have  an  effect  on  the  level  of  trust  that  an  older  adult  exhibits  toward  a  robot,  the  causal 
attributions  that  the  individual  makes  about  the  robot’s  performance,  and  people’s  perceptions  of 
the  capabilities  of  the  robot.  Specifically,  trust  in  the  robot  should  be  highest  when  the  task  is 
stereotypically  congruent  with  the  robot’s  appearance  (e.g.,  a  younger  adult  performing  a 
cognitive  task  instead  of  an  older  adult  performing  a  cognitive  task)  and  its  performance  is 
reliable.  This  is  hypothesized  because  appearance  influences  people’s  trust  in  automation  (Pak, 


22 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Fink,  Price,  Bass,  &  Sturre,  2012)  and  aging  stereotypes  will  less  likely  be  activated  while 
interacting  with  the  younger  robot.  The  attributions  about  the  robot’s  performance  may  be  more 
dispositional  when  reliability  is  low  and  the  task  is  incongruent  with  the  robot’s  appearance.  This 
is  because  older  adults  are  more  likely  to  make  dispositional  (i.e.,  internal)  attributions  of  blame 
when  an  outcome  of  an  event  is  perceived  as  negative  (the  unreliable  condition)  and  when  their 
beliefs  are  violated  (i.e.,  when  an  older  looking  robot  performs  the  cognitive  and  physical  tasks; 
Blanchard-Fields,  Hertzog,  &  Horhota,  2012).  Perceived  capabilities  of  the  robot  are 
hypothesized  to  depend  on  the  robot’s  appearance.  That  is,  capability  ratings  are  expected  to  be 
higher  when  the  younger  looking  robot  performs  the  tasks,  and  rankings  are  expected  to  be  lower 
when  an  older  looking  robot  performs  the  tasks.  This  is  expected  because  adults’  capabilities  in 
cognitive  and  physical  domains  are  expected  to  decline  with  age  (Kite,  Stockdale,  Whitley,  & 
Johnson,  2005;  Davis  &  Friedrich,  2010).  Task  domain  will  be  treated  as  an  exploratory  variable. 
However,  based  on  automation  trust  literature  suggesting  that  trust  in  robot’s  capabilities  might 
depend  on  the  domain  in  which  they  are  placed  (e.g.,  industry,  entertainment,  social;  Schaefer, 
Sanders,  Yordon,  Billings,  &  Hancock,  2012),  it  is  hypothesized  that  there  will  be  a  main  effect 
of  task  domain  such  that  participants  will  have  more  trust  in  the  robot  and  have  higher  ratings  of 
perceived  capabilities  when  the  robot  performs  physical  tasks. 

Methods,  Procedures,  Results 

[methods  can  be  found  in  Branyon  (2015),  preliminary  results  in  Branyon  &  Pak  (2015)  attached 
in  appendix] 

Conclusion 

This  study  offers  a  unique  contribution  by  investigating  a  well-researched  paradigm  from  the 
social  cognition  and  aging  literatures,  stereotypes,  and  applying  it  to  a  novel  field,  HRI. 
Preliminary  analyses  show  that  although  there  were  no  main  effects  of  robot  age  on  the 
dependent  variables,  age  moderated  the  effect  of  task  on  the  robot’s  perceived  capabilities  as 
well  as  the  types  of  causal  attributions  individuals  made  about  the  robot’s  performance.  In 
general,  the  robot  was  perceived  more  positively  when  completing  a  fine  motor  task  or  light 
cognitive  tasks  than  when  it  performed  a  gross  motor  task  (i.e.,  moving  boxes).  Reliable 
cognitive  task  performance  yielded  the  highest  dispositional  attribution  ratings  regardless  of 
robot  appearance.  This  finding  suggests  that  people  might  attribute  outcomes  differently  in  the 
context  of  human-robot  interaction  than  in  human-human  interaction.  These  findings  emphasize 
the  importance  of  task  type  on  older  adults’  perceptions  of  robots.  In  this  context,  users  trust 
robots  that  perform  cognitive  and  light  motor  tasks  more  than  ones  that  perform  gross  motor 
tasks.  It  is  also  important  to  select  the  appropriate  age  appearance  for  robots  based  on  the  tasks 
they  are  to  perform.  Tentatively,  the  results  suggest  selecting  a  younger  appearance  for  a  robot 
that  will  perform  cognitive  tasks. 
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Previous  research  has  shown  that  gender  stereotypes,  elicited  by  the  appearance  of  the  anthropomorphic  technology,  can 
alter  perceptions  of  system  reliability.  The  current  study  examined  whether  stereotypes  about  the  perceived  age  and  gender 
of  anthropomorphic  technology  interacted  with  reliability  to  affect  trust  in  such  technology.  Participants  included  a  cross- 
section  of  younger  and  older  adults.  Through  a  factorial  survey,  participants  responded  to  health-related  vignettes  containing 
anthropomorphic  technology  with  a  specific  age,  gender,  and  level  of  past  reliability  by  rating  their  trust  in  the  system.  Trust 
in  the  technology  was  affected  by  the  age  and  gender  of  the  user  as  well  as  its  appearance  and  reliability.  Perceptions  of 
anthropomorphic  technology  can  be  affected  by  pre-existing  stereotypes  about  the  capability  of  a  specific  age  or  gender. 

Practitioner  Summary:  The  perceived  age  and  gender  of  automation  can  alter  perceptions  of  the  anthropomorphic 
technology  such  as  trust.  Thus,  designers  of  automation  should  design  anthropomorphic  interfaces  with  an  awareness  that 
the  perceived  age  and  gender  will  interact  with  the  user’s  age  and  gender. 

Keywords:  automation;  trust;  aging;  stereotypes;  mobile;  health 

1.  Anthropomorphic  technology  can  elicit  stereotypes 

Interactive  computer  systems  that  exhibit  human-like,  or  anthropomorphic,  traits  can  lead  users  to  perceive  and  treat  them 
differently  than  non-human-like  systems  (Nass,  Steuer,  and  Tauber  1994).  Thus,  it  is  imperative  to  understand  how  users’ 
perceptions  of  the  system  might  be  affected  by  their  social  reactions  to  anthropomorphic  technology.  One  way  in  which  a 
system  may  elicit  social  reactions  is  by  eliciting  stereotypes  (Yee,  Bailenson,  and  Rickerson  2007). 

Stereotypes  are  preconceptions  about  the  traits,  behaviour,  or  abilities  of  a  group  and  can  set  expectations  of  a 
stereotyped  individual.  Stereotypes  can  have  both  negative  and  positive  connotations  that  may  be  inconsistent  with  real 
group  attributes  but  provide  adaptive  value  because  they  hlter  and  organise  incoming  information,  thereby  easing 
processing  and  interpretation  (Hilton  and  von  Hippel  1996).  Stereotypes  can  be  activated  and  applied  with  or  without 
conscious  awareness  (Banaji,  Hardin,  and  Rothman  1993;  Greenwald  and  Banaji  1995).  Unfortunately,  when  the  stereotype 
is  highly  simplihed  or  inaccurate,  it  can  lead  to  errors  in  perceptions  and  behaviour. 

Nass,  Steuer,  and  Tauber  (1994)  tested  whether  users  would  apply  gender-related  stereotypes  when  interacting  with  a 
computer  that  exhibited  a  gender.  Their  participants  were  hrst  tutored  by  a  computer  on  a  specihc  topic.  Tutored  topics  were 
either  stereotypically  female  (love  and  relationships)  or  stereotypically  male  (computers  and  technology).  They  then  moved 
to  a  non-gendered  computer  for  testing  and  to  a  gendered  computer  for  evaluation  of  their  test  responses.  When  gender  of 
the  tutor  matched  the  stereotypic  topic,  participants  rated  it  as  a  better  teacher.  This  hnding  was  echoed  by  Lee  (2003)  in  a 
study  where  participants  answered  difficult  trivia  questions  that  were  either  stereotypically  feminine  or  masculine.  After 
answering  the  trivia  question,  participants  viewed  a  female  or  male  computerised  agent  that  presented  its  own  answer  and 
then  were  allowed  to  change  their  answer.  More  participants  changed  their  answers  to  agree  with  the  agent  when  the  gender 
of  the  agent  matched  the  stereotypical  topic. 

Stereotype  activation  for  computerised  agents  can  also  interact  with  individual  differences,  such  as  physical 
characteristics.  Qiu  and  Benbasat  (2009)  found  that  an  anthropomorphic  decision  aid  signihcantly  increased  perceptions  of 
social  presence  and  led  to  increased  trust  of  the  agent.  The  strength  of  these  effects  was  influenced  by  the  degree  to  which 
the  decision  aid  agent  was  similar  to  the  user  on  a  visible  factor,  such  as  ethnicity.  The  link  between  trust  and  apparent 
physical  characteristics  was  explained  via  similarity-attraction  theory  that  predicted  that  people  would  be  more  attracted  to 
those  similar  to  them  (Byrne  1971).  The  user  may  have  attributed  their  attraction  to  a  similar  ethnicity  as  trustworthiness  of 
the  agent. 
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In  another  example  of  the  moderating  role  of  individual  differences  in  susceptibility  to  anthropomorphic  effects, 
susceptibility  to  flattery  (insincere  praise)  depended  on  the  level  of  computer  experience  of  the  user  (Johnson,  Gardner,  and 
Wiles  2004).  Johnson,  Gardner,  and  Wiles  found  that  susceptibility  to  flattery  from  a  computer  depended  on  the  user’s 
experience  level  with  computers  -  the  judgments  of  highly  experienced  users  were  more  affected  by  flattery  than  less 
experienced  users.  Furthermore,  Lee  (2010)  found  that  people  who  exhibited  less  analytical  and  more  intuitive  cognitive 
style  were  more  susceptible  to  flattery  from  a  computer. 

In  summary,  stereotypes  can  affect  user  perceptions  of  a  computer  or  automated  aid  and  can  be  moderated  by  individual 
differences.  Some  of  the  aids  described  in  the  previous  studies  were  forms  of  automation  that  functioned  in  a  decision- 
support  capacity;  thus,  some  automation  bias  may  be  based  on  stereotypes  (Skitka,  Mosier,  and  Burdick  1999).  However,  no 
research  has  explicitly  examined  how  these  factors  might  interact  with  machine-related  factors  of  automation,  such  as 
reliability  of  the  automation  or  how  various  activated  stereotypes  might  interact  (e.g.  age  and  gender). 

2.  Age  stereotypes  in  teehnology? 

Age  is  one  of  the  first  and  most  salient  attributes  noticed  of  a  person  (Fiske  1998)  which  suggests  it  may  also  be  true  with 
anthropomorphic  agents.  Furthermore,  stereotypes  about  age  are  stronger  (Kite,  Deaux,  and  Miele  1991)  and  more  complex 
than  gender  stereotypes  (Kite  et  al.  2005).  In  Kite,  Deaux,  and  Miele’ s  study  assessing  age  and  gender  stereotypes  using  free 
response,  participants  viewed  a  younger  (35-year-old)  male  or  female  and  older  (65-year-old)  male  or  female  and  provided 
characteristics  of  the  target  person.  Analysis  showed  that  when  negative  stereotypes  were  generated,  they  were  much  more 
likely  to  be  due  to  the  age  of  the  target  than  the  gender.  Finally,  according  to  the  similarity-attraction  hypothesis  (Qui  and 
Benbasat  2009),  older  and  younger  adults  should  exhibit  positive  anthropomorphic  effects  with  automation  that  matches 
their  age  group.  However,  it  may  also  be  that  an  older-looking  automated  agent  may  prime  negative  stereotypes  about  age, 
particularly  when  the  reliability  of  the  automation  is  perceived  to  be  low.  This  may  explain  why  a  previous  study  found  that 
a  young  female  agent  enhanced  younger  adults’  trust  in  automation  but  not  older  adults’  when  participants  interacted  with  a 
health  decision  aid  (Pak  et  al.  2012).  The  authors  hypothesised  that  the  dissimilarity  between  a  younger  female  decision 
agent  and  an  older  participant  may  have  muted  any  potential  anthropomorphic  effect  on  trust  due  to  violation  of  the 
similarity-attraction.  An  alternative  explanation  is  that  older  adults  hold  negative  stereotypes  of  the  capabilities  of  younger, 
female  doctors  but  younger  adults  do  not. 

3.  Age  and  gender  stereotypes  of  physieians 

People  hold  stereotypes  that  older  workers  have  lower  ability,  are  less  motivated,  and  are  less  productive  than  younger 
workers  (Posthuman  and  Campion  2009).  Older  workers  are  also  seen  as  less  adaptable  to  changing  work  situations  and 
uncertainty  than  younger  workers  (DeArmond  et  al.  2006).  Although  aging  studies  show  that  these  views  may  be 
exaggerated  (e.g.  see  Czaja  and  Sharit  1998),  they  are  widely  held  by  people  of  all  ages  and  affect  workplace  hiring 
decisions  and  evaluations  (DeArmond  et  al.  2006;  Posthuma  and  Campion  2009).  Negative  age  stereotypes  about  older 
workers  are  even  held  by  older  adults  themselves  (Rosen  and  Jerdee  1976;  Finkelsteln  and  Burke  1998;  Wrenn  and  Maurer 
2004).  Finally,  these  stereotypes  may  be  activated  without  awareness  (Devine  1989;  Perdue  and  Gurtman  1990;  Banaii  and 
Hardin  1996). 

Activation  of  age  stereotypes  may  be  moderated  by  individuating  past  behaviour  or  context  (Kunda  and  Sherman- 
Williams  1993).  Individuating  information  such  as  context  (e.g.  interacting  with  a  doctor)  may  determine  which  aspect  of  a 
stereotype  gets  activated  (Casper,  Rothermund,  and  Wentura  2011).  Knowing  the  occupation  of  an  individual  is  a  type  of 
individuating  information  that  seems  to  alter  some  negative  age  stereotypes.  For  example,  although  some  occupations  seem 
more  negatively  age  stereotyped  (e.g.  Cleveland  and  Hollman  1990),  the  occupation  of  physician  is  moderately  seen  as  a 
stereotypically  older  male  occupation  (Singer  1986)  even  though  it  is  an  occupation  that  may  require  adaptability  and  is 
faced  with  uncertainty.  In  contrast,  when  stereotypes  of  doctors  were  more  recently  assessed  (Shah  and  Ogden,  2006), 
younger  female  doctors  were  perceived  as  having  better  personal  manner  and  technical  skill  than  older  doctors  of  either 
gender.  The  scant  literature  on  physician  age  stereotypes  seems  to  suggest  that  the  stereotype  of  older  doctors  is  less 
negative  than  the  stereotype  for  older  adults  in  general,  but  still  present  (McKinstry  and  Yang  1994),  demonstrating  the 
power  of  individuating  information  on  the  otherwise  powerful  age  stereotype. 

In  summary,  person-judgment  based  on  stereotypes  can  depend  on  individuating  information,  including  profession,  past 
performance  (i.e.  reliability),  gender,  and  age.  Similarly,  assessment  of  computer-based  automation  with  human-like 
characteristics  may  also  be  subject  to  pre-existing  stereotypes  consistent  with  the  human-like  qualities  (e.g.  age,  gender). 
Anthropomorphic  automation  with  ambiguous  reliability  may  be  more  likely  to  activate  pre-existing  stereotypes.  That  is, 
when  automation  is  unambiguously  reliable  or  unreliable,  stereotypes  should  not  affect  perceptions.  But  when  automation  is 
ambiguous,  stereotypes  will  affect  perceptions  of  the  automation  such  as  trust.  The  idea  that  imperfect  automation  may 
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engender  the  expression  of  implicit  attitudes  has  been  suggested  by  other  automation  researchers  (Lee  and  See  2004; 
Merritt,  Heimbaugh,  and  LaChapell  2012). 

4.  Anthropomorphism  and  automation  characteristics 

Studies  of  human -automation  interaction  have  demonstrated  that  many  factors  related  to  the  person,  automated  system, 
and  task  interact  to  determine  trust  in  and  performance  with  automation.  For  example,  individual  differences  in  attitudes 
towards  automation  (e.g.  Mosier  et  al.  1998;  Dzindolet  et  al.  2003;  Merritt  and  Ilgen  2008)  interacted  with  machine 
characteristics  such  as  reliability  and  error  types  (e.g.  Madhavan,  Wiegmann,  and  Lacson  2006;  Rovira,  McGarry,  and 
Parasuraman  2007)  and  task  or  situational  factors  such  as  workload  (e.g.  Rottger,  Bali,  and  Manzey  2009)  to  affect 
behaviour  with  and  perceptions  of  automation. 

Research  investigating  the  influence  of  anthropomorphic  aspects  specifically  on  human -automation  interaction 
(Parasuraman  and  Miller  2004,  Pak  et  al.  2012)  found  that  various  implementations  of  anthropomorphism  such  as  etiquette 
(Bickmore  2011;  Zhang,  Zhu,  and  Kaber  2011)  affected  perceptions  of  trust  and  automation  behaviour.  For  example,  in 
aircraft  engine  diagnosis,  the  automation  either  presented  advice  in  a  rude  or  polite  manner  (Parasuraman  and  Miller  2004). 
As  expected,  perceived  trust  and  performance  in  the  diagnosis  task  was  better  when  the  automation  was  80%  reliable 
compared  to  60%  reliable.  However,  engine  diagnosis  performance  and  trust  with  polite  but  less  reliable  automation  was  the 
same  as  rude  but  highly  reliable  automation.  It  was  not  speculated  why  etiquette  would  interact  with  reliability  but  it  may  be 
that  politeness  affected  an  internal  belief  that  artificially  adjusted  expectations  of  the  automation  that  influenced  attributions 
of  responsibility  (e.g.  Marakas,  Johnson,  and  Palmer  2000). 

Thus,  behaviour  with  anthropomorphic  automation  is  affected  by  how  it  is  perceived  in  addition  to  its  reliability. 
The  literature  in  computer-mediated  communication  has  demonstrated  the  computers  as  social  actors  effect  (e.g.  stereotype 
elicitation,  susceptibility  to  flattery)  as  well  as  the  moderating  influence  of  individual  differences  (e.g.  cognitive  style, 
ethnicity).  Complementing  these  findings,  the  automation  literature  has  shown  that  overt  anthropomorphic  elements 
(etiquette,  human-like  appearance)  in  automation  can  interact  with  machine-related  factors  such  as  automation  reliability  to 
influence  trust  and  performance.  The  conceptual  link  between  these  two  literatures  is  the  finding  that  implicit  attitudes  about 
automation  itself,  or  beliefs  about  the  capabilities  of  automation  held  without  conscious  awareness,  significantly  affect  trust 
in  automation  but  only  when  reliability  of  the  automation  was  uncertain  (Lee  and  See  2004;  Merritt,  Heimbaugh,  and 
LaChapell  2012). 

Merritt,  Heimbaugh,  and  LaChapell  (2012)  theorised  that  implicit  general  attitudes  about  automation  affected  the 
propensity  to  trust  machines  and  an  individual’s  trust  in  a  specific  automated  system.  Perceptions  of  the  behaviour  of  any 
automation  will  be  filtered  through  these  explicit  and  implicit  pre-existing  beliefs  about  automation  (Dzindolet  et  al.  2002). 
Merritt  et  al.  found  that  when  automation  reliability  was  ambiguous,  implicit  beliefs  about  automation  and  stereotypes  were 
more  influential  in  determining  trust  than  explicit  beliefs.  Presumably,  in  the  face  of  ambiguity,  individuals  made 
attributions  that  were  consistent  with  their  implicit,  schematic  pre-existing  beliefs  about  automation.  This  paralleled 
findings  from  the  social  cognition  literature  that  stereotypic  reasoning  was  common  when  an  individual  was  faced  with 
conflicting  or  ambiguous  information  (Kunda  and  Thaggard  1996). 

Reframing  the  results  of  Parasuraman  and  Miller  (2004)  in  light  of  the  findings  of  Merritt,  Heimbaugh,  and  LaChapell 
(2012),  it  may  be  that  when  automation  performance  was  ambiguous/of  low  reliability  participants  fell  back  to  their  newly 
formed  positive  implicit  beliefs  about  the  automation  (that  the  automation  was  polite),  and  the  participants  made  more 
situational  rather  than  dispositional  attributions  (i.e.  attributed  fault  to  the  situation,  not  the  automation).  For  the  present 
study,  Merritt  et  al’s  and  Parasuraman  and  Miller’s  studies  are  crucial  for  several  reasons.  First,  they  showed  that  implicitly 
held  beliefs  influence  explicit  perceptions  of  trust  in  automation.  Second,  the  implicit  attitudes  interacted  with  automation 
reliability  to  determine  trust  and  behaviour.  Factors  at  the  person-level  (stereotypes)  and  task-level  (automation  reliability) 
interacted  to  affect  judgments  and  perceptions  of  technology.  There  is  a  wealth  of  research  examining  the  role  of  etiquette 
on  automation  perceptions  (Hayes  and  Miller  2011)  but  the  current  work  extends  the  concept  that  another  type  of  implicitly 
held  perception  (stereotypes)  may  affect  how  users  perceive  automation.  The  present  study  extended  previous  work  on 
gender  stereotypes  on  automation  behaviour  by  examining  another  potential  stereotype:  age. 

5.  Overview  of  the  study 

Using  participants  in  younger  and  older  adult  age  groups,  we  collected  judgments  of  trust  of  a  simulated  agent  embedded 
within  a  decision  aid  that  varied  in  gender,  age,  and  reliability  using  a  factorial  survey  with  concrete  health-related  vignettes. 
Following  the  social  cognition  literature,  we  expected  that  age  and  gender  stereotypes  would  most  affect  trust  in  the  decision 
aid  when  system  performance  was  ambiguous,  but  that  there  would  be  different  effects  for  different  age  groups  and  genders  of 
users.  Specific  research  aims  were  as  follows:  (1)  Determine  the  amount  of  variance  in  trust  due  to  within-person  variation 
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compared  to  between-person  variation,  (2)  Determine  how  age  of  the  agent,  gender  of  the  agent,  and  reliability  of  the  decision 
aid  agent  affected  judgments  of  trust  in  the  aid,  and  (3)  Determine  how  individual  differences  such  as  age  and  gender  of  the 
participant  affected  trust  ratings  of  various  decision  aids.  The  results  informed  basic  knowledge  of  how  differing  age  and 
gender  groups  responded  to  stereotypes  as  well  as  informing  the  design  of  decision  aids  targeting  particular  groups  of  users. 

We  presented  scenarios  involving  a  decision  aid  (a  smartphone  ‘app’)  for  diabetes  management  via  a  factorial  survey. 
The  decision  aid  contained  a  simulated  anthropomorphised  agent.  Factorial  surveys  have  been  widely  used  to  examine  how 
beliefs,  judgments,  and  decision-making  are  influenced  by  situational  factors  (Rossi  and  Anderson  1982).  Specific  factors  of 
the  scenario  were  manipulated  (in  a  factorial  manner)  and  the  participant  rated  all  combinations  of  factors.  The  agent  was  a 
health-care  provider  offering  advice  on  a  specific  diabetes-related  dilemma.  Because  our  dependent  variable  (trust)  was  a 
social  judgment  about  a  situation,  a  factorial  survey  was  an  ideal  way  to  measure  the  influence  of  manipulated  variables 
(age,  gender,  reliability  of  automation)  as  well  as  individual  differences  of  the  participants  (Rossi  and  Anderson  1982;  Hox, 
Kreft,  and  Hermkens  1991). 


5.1  Method 

5.1.1  Participants 

Sixty  younger  adults  and  47  older  adults  completed  the  study.  The  mean  age  of  the  younger  group  was  18.6  (SD  =  0.9)  while  the 
older  group  was  72.7  (SD  =  5.3).  Younger  adults  were  undergraduate  college  students  whereas  older  participants  were 
independently  living,  community-dwelling  older  adults.  The  younger  participants  chose  to  receive  either  course  credit  or  $7  per 
hour  and  the  older  participants  received  $7  per  hour.  Descriptive  statistics  of  participant  characteristics  are  shown  in  Table  1. 


5.1.2  Materials 

Equipment.  PC-compatible  (Windows  7)  computers  running  at  3.2  GHz  with  4  GB  of  RAM  were  used  with  a  19-inch  (48.3- 
cm)  LCD  monitor  set  at  a  resolution  of  1024  X  1280  pixels.  Participants  were  seated  approximately  18  inches  from  the 
monitor  and  interacted  primarily  with  a  mouse  (on  the  preferred  side)  and  a  keyboard. 


Individual  difference  measures.  In  addition  to  participant  age  group  and  gender,  we  were  interested  in  two  individual 
difference  measures:  automation  complacency  and  prior  diabetes  knowledge.  The  Complacency  Potential  Rating  Scale 
(CPRS;  Singh,  Molloy,  and  Parasuraman  1993)  is  a  16-item  scale  designed  to  measure  complacency  towards  common  types 
of  automation  (e.g.  automated  teller  machines).  Participants  responded  to  the  extent  they  agreed  with  statements  about 
automation  on  a  scale  of  1  -  5.  The  CPRS  score  was  a  sum  of  these  responses  and  ranged  from  16  (low  complacency  potential) 
to  80  (high  complacency  potential).  We  were  primarily  interested  in  CPRS  to  compare  our  sample  to  other  studies  that  show 
higher  complacency  potential  in  older  adults  (e.g.  Ho,  Wheatley,  and  Scialfa  2005).  Diabetes  knowledge  was  assessed  with 
the  Diabetes  Knowledge  Test  (DKT;  Fitzgerald  et  al.  1998).  The  23  questions  of  the  DKT  assessed  basic  knowledge  about 
diabetes  and  diabetes  management.  Computerised  versions  of  both  the  CPRS  and  DKT  were  used  in  this  study. 


Task.  In  a  factorial  survey,  independent  variables  are  called  dimensions.  The  dimensions  are  orthogonal  and  can  have 
multiple  levels.  Orthogonal  dimensions  allowed  us  to  disentangle  the  unique  effects  of  each  dimension  on  judgments  of 
trust.  Our  dimensions  were  agent  gender  (male,  female),  agent  age  (younger,  older),  and  aid  reliability  (low,  medium,  high). 


Table  1.  Participant  characteristics  by  age  group  and  gender. 


Younger  adults  (n  =  60) 

Older  adults  (n 

=  47) 

Female  (n 

=  37) 

Male  (n 

=  23) 

Female  (n 

=  25) 

Male  (n 

=  22) 

Mean 

SD 

Mean 

SD 

Mean 

SD 

Mean 

SD 

Age 

18.49 

0.72 

18.74 

1.15 

72.00 

5.29 

73.45 

5.27 

CPRS"* 

43.73 

3.83 

43.00 

5.38 

48.52 

5.31 

46.09 

4.04 

Diabetes  knowledge'’* 

11.68 

2.02 

11.48 

2.52 

14.24 

2.81 

13.41 

2.84 

*Significant  age  group  difference,  p  <  0.05  (no  significant  gender  differences). 

"Scores  could  range  from  16  indicating  low  complacency  potential  to  80  indicating  high  complacency  potential  (Singh,  Molloy,  and  Parasuraman  1993). 
'’The  DKT  scores  could  range  from  0  indicating  no  knowledge  to  23  indicating  high  knowledge  (Fitzgerald  et  al.  1998). 
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Table  2.  Dimensions  (independent  variables)  of  interest  and  resulting  scenarios. 


Scenario 

Agent  age  (2) 

Agent  gender  (2) 

Stated  reliability  (3) 

1 

Young 

Female 

45% 

2 

Young 

Female 

70% 

3 

Young 

Female 

95% 

4 

Young 

Male 

45% 

5 

Young 

Male 

70% 

6 

Young 

Male 

95% 

7 

Older 

Female 

45% 

8 

Older 

Female 

70% 

9 

Older 

Female 

95% 

10 

Older 

Male 

45% 

11 

Older 

Male 

70% 

12 

Older 

Male 

95% 

Note:  Each  scenario  was  presented  twice  resulting  in  24  unique  vignettes. 


The  dimensions  of  interest,  their  levels,  and  the  factorial  combinations  resulting  in  12  possible  scenarios  are  shown  in  Table  2. 
Each  scenario  was  replicated  twice  to  create  24  unique  vignettes.  This  resulted  in  12  measurements  of  each  dimension 
per  participant.  In  their  review  of  the  literature,  Wickens  and  Dixon  (2007)  proposed  that  an  automation  reliability  of  about  70% 
represented  a  critical  inflection  point;  less  than  about  70%  reliable  was  not  relied  upon  while  reliabilities  higher  than  70%  led  to 
complacency.  For  this  reason,  we  chose  high  and  low  values  that  were  well  above  and  below  70%  (45%,  70%,  and  95%)  to 
represent  low,  medium,  and  high  reliabilities,  respectively.  Participants  never  actually  experienced  the  levels  of  automation 
reliability;  they  were  only  told  the  past  reliability  of  the  particular  app  that  was  shown.  No  matter  the  stated  past  reliability  of  an 
app,  the  advice  given  by  the  app  in  every  scenario  was  correct. 

The  possible  combinations  of  agent  age  and  gender  are  shown  in  Figure  1.  An  example  vignette  (containing  older 
female,  high  reliability)  is  illustrated  in  Figure  2.  The  diabetes  dilemma  was  presented  in  the  upper  left  of  screen.  On  the 
right,  a  diagnostic  smartphone  app  gave  a  possible  solution  via  an  agent.  The  size  of  the  smartphone  was  larger  than  actual 
size  (approximately  30%  larger)  to  be  easily  viewable  from  seated  distance.  Also,  on  the  screen  was  a  statement  about  the 
past  reliability  of  the  particular  app  (low,  medium,  or  high).  On  the  lower  third,  participants  rated  on  a  Likert  scale  their 
perception  of  trust  and  likelihood  of  following  the  advice  of  the  aid. 

The  diabetes  scenarios  were  used  in  a  prior  study  (Pak  et  al.  2012)  and  were  developed  by  adapting  questions  from  a 
diabetes  education  workbook  (Drucquer  and  McNally  1998),  and  reading  diabetes  support  forums.  They  were  designed  to 
represent  realistic  scenarios  that  someone  with  Type  II  diabetes  might  experience.  The  presentation  of  the  factorial  survey 
was  programmed  in  the  Real  Studio  environment  (Real  Software  2013). 


5.1.3.  Design  and  procedure 

The  study  was  a  2  (age  group  of  respondent:  younger,  older)  X  2  (gender  of  respondent:  male,  female)  X  2  (agent  age: 
young,  old)  X  2  (agent  gender:  male,  female)  X  3  (aid  reliability:  low,  medium,  high)  mixed-model  design,  with  within- 


Figure  1.  Illustration  of  the  four  possible  smartphone  agent  conditions  (young  female,  young  male,  older  female,  older  male). 
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Remember:  You  are  not  trying  to  solve  the  problem  below.  You  are  giving  your  opinion  on  the  smartphone  app. 


You  were  recentfy  dognosed  with  Type  II  diabetes  and  manage  it  with  diet  and  medication  Your  phnury 
care  doctor  told  you  when  your  blood  glucose  gets  low  arid  you  feel  shaky,  you  should  take  a  pir>ch  of 
table  sugar  However,  you  feel  that  taking  a  'pinch*  of  table  sugar  is  not  a  precise  enough  measurement 
so  you  want  to  eat  something  else  Your  doctor  approves  this  ar>d  reminds  you  that  you  need 
approximately  1 5  g  of  carbohydrates  to  substitute  for  one  pinch  of  table  sugar. 


1 - 

2%  Milk 

Ufa saver 

Candy 

Tropic  ana 
Orange 

Juice 

Nature’s  Own 
White  Bread 

$«rving  Size 

250  g 

24  g 

250  g 

59  g 

Calooee 

90 

110 

110 

Total  Fat 

5g 

Og 

Og 

0.5  g 

Saturated  Pat 

3g 

Og 

Og 

Og 

[  Cholesterol 

20  mg 

0  mg 

0"5t 

0  mg 

total  Carbohydrate 

1*3  g 

24  g 

26  g 

25  g 

Sugars  |12g 

22s 

22s_ 

2S_ 

What  should  you  eat  instead  of  sugar  to  adjust  your  blood  glucose  levels? 


Past  rellablltty  of  THIS  QCO/ 
app's  advice  has  been:  99/0 


How  muon  oo  ireu  trvM  tno  tmanpnono  hoipor? 


Hew  liko*y  ar*  you  to  tellew  tn#  smartphona  l>a4pa<a  rwconwtofMatien? 


INMtfll  I  2  3  1^1*  rCGwaIHrt) 


Next 


Figure  2.  Image  of  the  factorial  survey  response  screen. 

participant  factors  manipulated  in  the  factorial  survey.  The  first  two  variables  (age  group  and  gender  of  respondent)  were 
quasi-independent  grouping  variables  while  the  last  three  were  within-groups  manipulations  of  the  decision  aid  and  agent. 
The  dependent  variables  were  trust,  likelihood  of  following  advice,  and  diabetes  knowledge. 

Participants  first  completed  a  diabetes  knowledge  questionnaire  administered  on  a  computer.  Next,  participants  started 
the  factorial  survey  and  were  told: 

You  are  playing  the  part  of  a  newly  diagnosed  diabetic.  Your  doctor  has  given  you  a  variety  of  different  smartphone  apps  that  may 
help  you  with  your  diabetes  care.  Your  task  involves  giving  us  your  opinion  of  the  different  smartphone  apps.  Just  like  many 
technological  aids,  the  different  apps  will  only  sometimes  seem  reliable.  Your  performance  is  not  being  tested  so  you  do  not  have  to 
try  to  solve  every  problem.  Instead,  you  are  making  judgments  of  the  smartphone  apps  as  quickly  as  possible. 

After  acknowledging  the  instructions  and  answering  any  remaining  questions  they  began  the  survey. 

In  the  survey,  participants  viewed  a  randomly  presented  vignette  and  were  asked  the  following  questions:  (1)  how  much 
they  trusted  the  smartphone  app  on  a  scale  from  1  (not  at  all)  to  7  (very  much),  and  (2)  whether  they  would  follow  or 
actually  use  the  advice  of  the  app  (1-7).  After  the  trust  and  decision  aid  usage  questions,  participants  were  also  asked  to 
briefly  explain  their  ratings.  To  reinforce  the  notion  that  the  smartphone  app  was  a  real  decision  aid  and  not  just  a  pre¬ 
computed  image,  the  smartphone  app  did  not  reveal  its  answer  for  1.5  seconds  (in  the  interim  the  message,  ‘Analysing  the 
scenario.  Just  a  moment  . . .  ’  appeared  on  the  smartphone  screen).  After  responding  to  24  vignettes,  participants  completed 
the  CPRS.  Finally,  participants  answered  the  question,  ‘What  do  you  think  the  study  was  about?’  to  assess  whether  they 
were  aware  of  the  purpose  of  the  study.  None  of  our  participants  were  able  to  accurately  state  the  purpose  of  the  study  other 
than  what  was  told  to  them  in  the  instructions  (evaluating  different  apps).  Because  the  trust  and  likelihood  to  follow  ratings 
were  highly  correlated  {r  =  0.83,  p  <  0.05)  only  trust  ratings  were  analysed. 

5.2  Results 

5.2.1  Hypothesised  model 

To  answer  our  original  research  questions  a  two-level  hierarchical  model  assessed  the  effects  of  agent  gender  and  age, 
decision  aid  reliability,  and  diabetes  knowledge  on  perceptions  of  trust  in  the  decision  aid.  To  review,  our  questions  were  (1) 
How  is  trust  in  an  anthropomorphic  decision  aid  affected  by  a  user’ s  age  and  gender?,  (2)  How  is  trust  in  the  smartphone  app 
affected  by  its  appearance  and  reliability?,  and  (3)  How  is  trust  affected  by  domain  knowledge? 
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Multiple  responses  were  nested  within  the  107  participants:  Each  participant  judged  24  vignettes  resulting  in  a  total  of 
2568  judgments  for  analysis.  These  judgments  were  nested  within  the  manipulations  performed  on  the  survey  (agent  age, 
agent  gender,  reliability),  which  were  in  turn  nested  within  the  attributes  of  the  participant  (participant  age,  participant 
gender,  diabetes  knowledge  score,  CPRS  score).  Multi-level  modelling  was  implemented  through  SAS,  version  9.2. 

Multi-level  models  are  appropriate  for  data  that  exhibit  hierarchical  structure  as  they  account  for  variability  between  and 
within  participants  and  allow  for  examination  of  cross-level  interactions  (Raudenbush  and  Bryk  2002).  Because 
respondents  repeatedly  made  judgments  on  varying  vignettes,  those  judgments  of  trust  were  not  independent  of  each  other; 
in  fact,  they  were  highly  likely  to  be  correlated  which  violates  the  independence  of  error  variances  assumption  of  analysis  of 
variance  (ANOVA)  and  regression  (Hox  and  Bechger  1998;  Tabachnick  and  Fidell  2007).  There  were  also  likely  to  be 
correlations  between  different  levels  (response  level,  group  level).  For  example,  trust  responses  on  a  vignette  would  likely 
be  correlated  to  the  responders  group  (gender,  age  group).  That  is,  males  may  have  a  different  stereotype  than  females  (or 
older  respondents  versus  younger  ones)  that  they  applied  to  the  situation.  Ignoring  this  hierarchical  structure,  or  nesting,  (i.e. 
by  using  ordinary  least  squares  regression)  can  lead  to  an  inflated  Type  I  error  rate,  or  detecting  effects  when  there  are  none 
(Tabachnick  and  Fidell,  2007).  Multi-level  modelling  solves  this  problem  by  allowing  intercepts  and  slopes  between  levels 
to  vary.  Variability  at  one  level  is  treated  as  a  dependent  variable  at  the  next  level.  Hoffman  and  Rovine  (2007)  provided  an 
accessible  description  of  the  usefulness  of  multi-level  linear  models  in  experimental  psychology  and  human  factors  and 
Hox,  Kreft,  and  Hermkens  (1991)  delailed  why  multi-level  modelling  is  preferred  for  the  analysis  of  factorial  surveys. 

A  fully  unconditional  (non-multivariate)  model  (Model  1)  was  used  to  discover  the  amount  of  variance  in  trust  found 
within  participants  at  the  survey  level  (Level  1 ;  variance  due  to  app  appearance)  and  the  amount  of  variance  at  the  person 
level  (Level  2;  variance  due  to  individual  differences).  This  model  represented  a  baseline  to  assess  the  fit  of  subsequent 
multivariate  models  (Models  2  and  3;  equations  in  Appendix).  Results  (Table  3)  revealed  significant  variance  at  both  levels, 
with  94%  of  the  variance  at  the  survey  level  (cr^  =  3.04,  z  =  35.08,  p  <  0.0001)  and  6%  of  the  variance  at  the  person  level 
(Too  =  0.19,  z  =  4.39,  p  <  0.0001). 

Model  2  examined  the  effects  of  the  survey  manipulations  on  judgments  of  trust:  agent  gender,  agent  age,  reliability,  and 
all  Level  1  interactions.  Results  revealed  signihcant  effects  for  all  survey  manipulations.  Participants  trusted  male  agents 


Table  3.  Unstandardised  coefficients  of  multi-level  models  of  the  within  and  between-person  effects  of  predictors  on  trust. 


Model  1 

Model  2 

Model  3 

Unconditional 

Model 

Random  Coefficients 
Regression 

Slopes  and  Intercepts 

Estimate 

SE 

Estimate 

SE 

Estimate 

SE 

Fixed  ejfects 

Intercept 

0.05 

3.52*** 

0.11 

3.43*** 

0.13 

Between-person 

Age  Group 

0.35* 

0.14 

Gender 

-0.13 

0.12 

Diabetes  knowledge  score 

-0.06** 

0.02 

CPRS 

-0.01 

0.01 

Within-person 

Agent  gender 

0.67*** 

0.14 

0.67*** 

0.14 

Agent  age 

0.38** 

0.14 

0.38** 

0.14 

Reliability  of  agent 

1.09*** 

0.08 

1.09*** 

0.08 

Agent  gender  X  agent  age 

-0.42* 

0.20 

-0.42* 

0.20 

Agent  gender  X  reliability 

-0.43*** 

0.11 

—  Q  44*** 

0.12 

Agent  age  X  reliability 

-0.36*** 

0.11 

-0.19 

0.12 

Agent  gender  X  agent  age  X  reliability 

0.47** 

0.15 

0.47** 

0.15 

Cross-level 

Age  group  X  agent  age  group  X  reliability 

-0.35*** 

0.09 

Gender  X  agent  gender  X  reliability 

0.09 

0.09 

Age  group  X  agent  gender  X  reliability 

-0.06 

0.09 

Gender  X  age  group  X  reliability 

-0.04 

0.09 

R  ^  within-person 

16.02 

16.55 

R  ^  between-person 

<0.01 

<0.01 

Random  effects 

a" 

3.04*** 

0.09 

2.56*** 

0.07 

2  54*** 

0.07 

Q  29*** 

0.04 

0.21*** 

0.04 

Q  20*** 

0.04 

*p  <  0.05,  **p  <  0.01,  ***p  <  0.001.  All  between-person  predictors  were  grand-mean  centred. 
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more  than  female  ones,  older  agents  more  than  younger  ones,  and  more  reliable  apps  than  less  reliable  ones.  However, 
multiple  signihcant  interactions  further  rehned  this  story.  The  three-way  interaction  of  agent  gender,  agent  age,  and  app 
reliability  was  signihcant  -  illustrated  in  Figure  3  -  such  that  when  the  app  was  of  low  reliability,  the  younger  female  agent 
was  trusted  signihcantly  less  than  the  younger  male  aid,  f(l,1272)  =  24.64,  p  <  0.05,  t|^  =  0.2,  although  there  were  no 
signihcant  differences  of  agent  gender  for  the  younger  agent  at  other  reliability  levels.  For  the  older  aid,  the  female  agent 
was  rated  as  less  trusted,  but  this  difference  occurred  only  at  the  medium  reliability  level,  f(l,1272)  =  13.91,  p  <  0.05, 
Tip  =  0.01.  These  hndings  are  consistent  with  our  hypothesis  that  stereotypes  would  affect  trust  judgments  when  the 
reliability  of  a  system  was  ambiguous  (i.e.  low  or  medium  reliability). 

A  third  model  was  conducted  to  include  the  individual  difference  predictors  of  participant  age  group,  participant  gender, 
CPRS,  and  diabetes  knowledge  and  to  examine  hypothesised  cross-level  interactions.  Our  hypothesis  was  that  participant 


Agent  Age  Group:  Younger  aid 


Gender  of 
Aid 

I  Femate  aid 
I  Male  aid 


Aid  reliability 


Error  Bars:  +/-  1  SE 
Agent  Age  Group:  Older  aid 


Aid  reliability 


Gender  of 
Aid 

I  Female  aid 
I  Male  aid 


Error  Bars:  +/-  1  SE 


Figure  3.  Three-way  interaction  of  agent  age  group,  agent  gender,  and  reliability  (from  Model  2). 
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age  group  would  interact  with  the  age  of  the  agent  to  differentially  affect  trust.  The  similarity-attraction  hypothesis  (Byrne 
1971)  would  predict  that  the  user’s  trust  would  be  highest  with  agents  that  appear  similar  to  them,  particularly  in  age- 
appearance.  We  examined  all  cross-level  interactions  in  Model  3. 

In  Model  3,  those  with  higher  diabetes  knowledge  rated  the  agents  as  less  trusted  overall.  Older  participants  generally 
rated  the  agents  as  more  trusted  than  did  younger  participants.  This  may  be  a  manifestation  of  the  generally  higher 
complacency  that  older  adults  have  with  automation  than  younger  adults  (Ho,  Wheatley,  and  Scialfa  2005).  Gender  of  the 
participant  and  CPRS  score  had  no  effect  on  trust  ratings.  By  entering  these  variables  in  the  model  they  were  controlled  for 
when  examining  the  cross-level  interactions.  Using  the  Akaike’s  information  criterion.  Model  3  was  determined  to  better  fit 
the  data  than  Model  2  (it  accounted  for  variance  beyond  Model  2).  The  three-way  interaction  among  participant  age  group, 
agent  age,  and  app  reliability  was  significant  (Figure  4).  The  source  of  the  interaction  was  that  younger  adults  in  the  low 
reliability  condition  tended  to  trust  older  agents  significantly  more  than  younger  agents,  F(l,1434)  =  16.88,  p  <  0.05, 


Participant  Age  Group:  Younger  Adults 


Agent  Age 
Group 

I  Younger  ajc 
I  Older  aid 


Aid  reliability 


Error  Bars:  +/- 1  SE 


Participant  Age  Group:  Older  Adults 


Agent  Age 
Group 
I  Yourigeraid 
I  Older  aid 


Aid  reliability 


Error  Bars:  +/- 1  SE 


Figure  4.  Cross-level  interaction  of  participant  age  group,  agent  age  group,  and  reliability  (from  Model  3). 
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T|p  =  0.006.  There  was  no  significant  difference  in  trust  by  younger  adults  as  a  function  of  agent  age  in  the  medium  or  high 
reliability  conditions.  For  older  adults,  there  was  no  significant  difference  in  trust  as  a  function  of  agent  age  in  any  of  the 
reliability  conditions.  Finally,  to  more  directly  test  the  possibility,  presented  in  the  introduction,  that  older  adults  may 
specifically  hold  negative  stereotypes  of  young  female  agents,  we  examined  the  four-way  interaction  of  agent  age,  agent 
gender,  age  group,  and  gender  and  found  it  to  be  not  significant. 


6.  General  discussion 

As  automation  in  consumer  products  and  systems  embodies  human-like  traits  (e.g.  anthropomorphic  agents),  stereotypes 
that  users  hold  of  age  and  gender  may  play  an  important  role  in  trust  and  use  of  that  automation.  Prior  research  established 
that  people  apply  gender  stereotypes  to  computers  but  the  purpose  of  this  study  was  to  examine  if  powerful  and  pervasive 
age  stereotypes,  as  well  as  gender  stereotypes,  would  be  applied  to  anthropomorphic  agents. 

The  finding  that  trust  varies  with  reliability  is  not  surprising;  with  higher  levels  of  perceived  reliability,  users, 
particularly  older  adults,  may  become  complacent  (Mouloua  et  al.  2002;  Ho,  Wheatley,  and  Scialfa  2005).  What  is 
surprising  is  that  this  relationship  between  trust  and  complacency  interacts  with  attributes  of  technology  and  individual 
differences  in  a  way  that  is  roughly  consistent  with  the  stereotype  literature,  specifically,  age  and  gender  stereotypes  of 
doctors.  However,  perceived  age  group  and  gender  of  the  agent  and  its  reliability  moderated  the  application  of  stereotypes 
(Model  2).  When  the  agent  appeared  young,  male  agents  were  more  trusted  than  female  agents  only  when  reliability  was 
low.  This  gender  difference  disappeared  at  other  levels  of  reliability.  This  pattern  might  suggest  that  unless  the  reliability  of 
the  system  is  catastrophically  low  (45%),  most  participants  do  not  exhibit  gender  stereotypic  thinking;  perceptions  of  trust 
are  primarily  driven  by  reliability.  However,  when  the  reliability  is  very  low,  participants  clearly  shift  to  more  stereotypic 
thinking  and  seem  to  attribute  low  performance  to  gender. 

When  the  agent  appeared  older,  male  agents  were  more  trusted  than  female  agents  only  at  medium  levels  of  reliability. 
That  is,  stereotypic  judgments  appear  at  more  moderate  levels  of  reliability  (70%  versus  45%)  if  the  aid  is  older  rather  than 
younger.  The  finding  of  gender  stereotypic  effects  at  45%  reliability  when  the  agent  is  young,  but  at  70%  when  the  agent  is 
old  seems  to  suggest  that  older  female  agents  are  judged  more  harshly  than  younger  female  agents.  Given  this  finding  one 
design  recommendation  is  that  when  it  is  crucial  for  users  to  maintain  high  levels  of  trust  in  imperfect  automation,  a  younger 
male  agent  is  optimal  because  it  seems  less  susceptible  to  large  fluctuations  in  perceptions  of  trust  as  a  function  of  gender 
(i.e.  gender  stereotypic  thinking).  More  specifically,  if  it  is  undesirable  to  have  users  exhibit  gender  differences  (or  bias)  in 
trust  then  using  younger  agents  was  preferable  to  older  agents.  A  male  agent  was  recommended  over  female  because  trust  in 
female  agents  appeared  more  erratic  as  a  function  of  reliability  compared  to  male  agents  (e.g.  the  steep  plunge  in  trust  at 
45%  reliability  for  young  females).  However,  this  design  recommendation  does  not  take  into  account  the  gender  or  age 
group  of  the  user.  As  the  significant  cross-level  interaction  of  Model  3  shows,  individual  differences  also  seem  to  interact 
with  the  agent  characteristics. 

Model  3  showed  that  some  anthropomorphic  aspects  of  the  aid  did  interact  with  participant  individual  differences  to 
affect  trust.  Younger  adults  in  low  reliability  conditions  tended  to  trust  older  agents  over  younger  agents  while  older  adults 
did  not  show  any  significant  differences  in  trust  as  a  function  of  agent  age.  Based  on  Model  3,  if  the  goal  is  to  maintain  high 
levels  of  trust  in  imperfect  automation  in  young  adult  users,  older  agents  (regardless  of  agent  gender)  are  preferred.  For 
older  adult  users,  there  was  no  significant  difference  in  trust  as  a  function  of  agent  age  group.  However,  there  did  appear  to 
be  a  trend  towards  higher  trust  of  younger  agents  with  increasing  reliability  so  for  older  users,  a  young  agent  may  be 
optimal. 

One  caveat  is  that  we  did  not  assess  a  priori  the  pre-existing  stereotypes  held  by  our  participants  (as  such  an  assessment 
might  have  influenced  their  behaviour  in  the  experiment.)  However,  the  stereotype  literature  is  replete  with  research  that 
shows  the  pervasiveness  of  the  ‘warm  but  not  competent’  stereotype  of  older  adults  not  only  in  the  USA  but  worldwide 
(Cuddy,  Norton,  and  Fiske  2005).  Another  limitation  is  the  use  of  a  diabetes  scenario.  Although  none  of  the  participants  in 
our  study  reported  having  diabetes,  older  adults  may  be  more  aware  of  diabetes  simply  because  it  is  more  common  in  their 
cohort  than  among  younger  adults  (26.9%  versus  1 1 .3%,  respectively;  American  Diabetes  Association,  2011).  Thus,  simply 
being  in  a  cohort  that  is  more  affected  by  diabetes  may  influence  how  one  perceives  diabetes  advice.  Another  limitation  was 
that  because  we  assessed  subjective  perceptions  of  the  automation  (trust)  because  it  is  uncertain  if  trust  translates  to 
behaviour.  However,  past  research  has  shown  that  perceptions  of  trust  in  automation  are  strongly  correlated  with  behaviour 
(e.g.  Lee  and  Moray  1994). 


Acknowledgements 

We  thank  Meghan  Goodwin  and  Kayla  Brennan  for  their  help  in  collecting  data  and  Peg  Tyler  for  her  help  in  manuscript  preparation. 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Downloaded  by  [Clemson  University]  at  06:30  20  June  2014 


Ergonomics 


11 


Funding 

This  research  was  supported  by  a  grant  from  the  Air  Force  Office  of  Scientific  Research  [award  number  FA9550-12-1-0385]. 


Notes  on  contributors 

Richard  Pak  is  cuirently  an  Associate  Professor  in  the  Department  of  Psychology  at  Clemson  University.  He  received  his  PhD  in 
psychology  in  2005  from  the  Georgia  Institute  of  Technology. 

Anne  McLaughlin  is  an  Associate  Professor  in  the  Department  of  Psychology  at  North  Carolina  State  University.  She  received  her  PhD  in 
psychology  in  2007  from  the  Georgia  Institute  of  Technology. 

Brock  Bass  received  his  MS  in  Human  Factors  from  Clemson  University. 


References 

American  Diabetes  Association.  2011.  “Statistics  About  Diabetes.”  Retrieved  from  http;//www.diabetes.org/diabetes-basics/statistics/ 
?loc=db-slabnav 

Banaji,  M.,  and  C.  D.  Hardin.  1996.  “Automatic  Stereotyping.”  Psychological  Science  7  (3):  136-141. 

Banaji,  M.  R.,  C.  Hardin,  and  A.  J.  Rothman.  1993.  “Implicit  Stereotyping  in  Person  Judgment.”  Journal  of  Personality  and  Social 
Psychology  65  (2):  272-281. 

Bickmore,  T.  2011.  “Etiquette  in  Motivational  Agents.”  In  Human-Computer  Etiquette:  Cultural  Expectations  and  the  Design 
Implications  They  Place  on  Computers  and  Technology,  edited  by  C.  C.  Hayes  and  C.  A.  Miller,  206-226.  Boca  Raton:  Auerbach. 
Byrne,  D.  E.  1971.  The  Attraction  Paradigm  (Vol.  11).  New  York:  Academic  Press. 

Casper,  C.,  K.  Rothermund,  and  D.  Wentura.  2011.  “The  Activation  of  Specific  Facets  of  Age  Stereotypes  Depends  on  Individuating 
Information.”  Social  Cognition  29  (4):  393-414. 

Cleveland,  J.  N.,  and  G.  Hollmann.  1990.  “The  Effects  of  the  Age-Type  of  Tasks  and  Incumbent  Age  Compositions  on  Job  Perceptions.” 
Journal  of  Vocational  Behaviour  36  (2):  181-194. 

Cuddy,  A.  J.,  M.  1.  Norton,  and  S.  T.  Fiske.  2005.  “This  Old  Stereotype:  The  Pervasiveness  and  Persistence  of  the  Elderly  Stereotype.” 
Journal  of  Social  Issues  61  (2):  267-285. 

Czaja,  S.  J.,  and  J.  Sharit.  1998.  “Age  Differences  in  Attitudes  toward  Computers.”  The  Journals  of  Gerontology  Series  B:  Psychological 
Sciences  and  Social  Sciences  53  (5):  329-340. 

DeArmond,  S.,  M.  Tye,  P.  Y.  Chen,  A.  Krauss,  D.  Apryl  Rogers,  and  E.  Sintek.  2006.  “Age  and  Gender  Stereotypes:  New  Challenges  in  a 
Changing  Workplace  and  Workforce.”  Journal  of  Applied  Social  Psychology  36  (9):  2184-2214. 

Devine,  P.  G.  1989.  “Stereotypes  and  Prejudice:  Their  Automatic  and  Controlled  Components.”  Journal  of  Personality  and  Social 
Psychology  56  (1):  5-18. 

Drucquer,  M.  H.,  and  P.  G.  McNally.  1998.  Diabetes  Management:  Step  by  Step.  Osney  Mead,  Oxford:  Blackwell  Science. 

Dzindolet,  M.  T.,  S.  A.  Peterson,  R.  A.  Pomranky,  L.  G.  Pierce,  and  H.  P.  Beck.  2003.  “The  Role  of  Trust  in  Automation  Reliance.” 

International  Journal  of  Human  Computer  Studies  58  (6):  691 -Hi. 

Dzindolet,  M.  T.,  L.  G.  Pierce,  H.  P.  Beck,  and  L.  A.  Dawe.  2002.  “The  Perceived  Utility  of  Human  and  Automated  Aids  in  a  Visual 
Detection  Task.”  Human  Eactors  44  (1):  79-94. 

Einkelstein,  L.  M.,  and  M.  J.  Burke.  1998.  “Age  Stereotyping  at  Work:  The  Role  of  Rater  and  Contextual  Factors  on  Evaluations  of  Job 
Applicants.”  The  Journal  of  General  Psychology  125  (4):  317-345. 

Eiske,  S.  T.  1998.  “Stereotyping,  Prejudice,  and  Discrimination.”  In  The  Handbook  of  Social  Psychology,  edited  by  D.  T.  Gilbert, 
S.  T.  Fiske,  and  G.  Linclzey.  4th  ed.  New  York:  McGraw-Hill. 

Fitzgerald,  J.  T.,  R.  M.  Anderson,  M.  M.  Funnell,  R.  G.  Hiss,  G.  E.  Hess,  W.  K.  Davis,  and  P.  A.  Barr.  1998.  “The  Reliability  and  Validity 
of  a  Brief  Diabetes  Knowledge  Test.”  Diabetes  Care  21  (5):  706-710. 

Greenwald,  A.  G.,  and  M.  R.  Banaji.  1995.  “Implicit  Social  Cognition:  Attitudes,  Self-esteem,  and  Stereotypes.”  Psychological  Review 
102  (1):  4-27. 

Hayes,  C.  C.,  and  C.  A.  Miller.  2011.  Human-Computer  Etiquette:  Cultural  Expectations  and  the  Design  Implications  They  Place  on 
Computers  and  Technology.  Boca  Raton:  Auerbach. 

Hilton,  J.  L.,  and  W.  von  Hippel.  1996.  “Stereotypes.”  Annual  Review  of  Psychology  47  (1):  237-271. 

Ho,  G.,  D.  Wheatley,  and  C.  T.  Scialfa.  2005.  “Age  Differences  in  Trust  and  Reliance  of  a  Medication  Management  System.”  Interacting 
with  Computers  17  (6):  690-710. 

Hoffman,  L.,  and  M.  J.  Rovine.  2007.  “Multilevel  Models  for  the  Experimental  Psychologist:  Foundations  and  Illustrative  Examples.” 
Behaviour  Research  Methods  39  (1):  101-117. 

Hox,  J.  J.,  and  T.  M.  Bechger.  1998.  “An  Introduction  to  Structural  Equation  Modeling.”  Family  Science  Review  11:  354-373. 

Hox,  J.  J.,  1.  G.  G.  Kreft,  and  P.  L.  J.  Hermkens.  1991.  “The  Analysis  of  Factorial  Surveys.”  Sociological  Methods  <&  Research  19: 
493-510. 

Johnson,  D.,  J.  Gardner,  and  J.  Wiles.  2004.  “Experience  as  a  Moderator  of  the  Media  Equation:  The  Impact  of  Elattery  and  Praise.” 

International  Journal  of  Human-Computer  Studies  61  (3):  237-258. 

Kite,  M.  E.,  K.  Deaux,  and  M.  Miele.  1991.  “Stereotypes  of  Young  and  Old:  Does  Age  Outweigh  Gender?”  Psychology  and  Aging  6  (1): 
19-27. 

Kite,  M.  E.,  G.  D.  Stockdale,  B.  E.  Whitley,  and  B.  T.  Johnson.  2005.  “Attitudes  toward  younger  and  Older  Adults:  An  Updated  Meta- 
analytic  Review.”  Journal  of  Social  Issues  61  (2):  241-266. 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Downloaded  by  [Clemson  University]  at  06:30  20  June  2014 


12 


R.  Pak  et  al. 


Kunda,  Z.,  and  B.  Sherman-Williams.  1993.  “Stereotypes  and  the  Construal  of  Individuating  Information.”  Personality  and  Social 
Psychology  Bulletin  19  (I);  90-99. 

Kunda,  Z.,  and  P.  Thagard.  1996.  “Forming  Impressions  from  Stereotypes,  Traits,  and  Behaviours:  A  Parallel  Constraint  Satisfaction 
Theory.”  Psychological  Review  103  (2):  284-308. 

Lee,  E.  I.  2003.  “Effects  of  ‘Gender’  of  the  Computer  on  Informational  Social  Influence:  The  Moderating  Role  of  Task  Type.” 
International  Journal  of  Human-Computer  Studies  58  (4):  347-362. 

Lee,  E.-J.  2010.  “The  More  Humanlike,  the  Better?  How  Speech  Type  and  Users’  Cognitive  Style  Affect  Social  Responses  to 
Computers.”  Computers  in  Human  Behaviour  26  (4):  665-672. 

Lee,  J.  D.,  and  N.  Moray.  1994.  “Trust,  Self-confidence,  and  Operators’  Adaptation  to  Automation.”  International  Journal  of  Human 
Computer  Studies  40  (1):  153-184. 

Lee,  J.  D.,  and  K.  A.  See.  2004.  “Trust  in  Automation:  Designing  for  Appropriate  Reliance.”  Human  Factors  46  (1):  50-80. 

Madhavan,  P.,  D.  A.  Wiegmann,  and  F.  C.  Lacson.  2006.  “Automation  Failures  on  Tasks  Easily  Performed  by  Operators  Undermine  Trust 
in  Automated  Aids.”  Human  Factors  48  (2):  241-256. 

Marakas,  G.  M.,  R.  D.  Johnson,  and  J.  W.  Palmer.  2000.  “A  Theoretical  Model  of  Differential  Social  Attributions  toward  Computing 
Technology:  When  the  Metaphor  Becomes  the  Model.”  International  Journal  of  Human-Computer  Studies  52  (4):  719-750. 

McKinstry,  B.,  and  S.  Y.  Yang.  1994.  “Do  Patients  Care  about  the  Age  of  Their  General  Practitioner?  A  Questionnaire  Survey  in  Five 
Practices.”  The  British  Journal  of  General  Practice  44  (385):  349-351. 

Merritt,  S.  M.,  H.  Heimbaugh,  and  J.  LaChapell.  2012.  “I  Trust  It,  but  I  Don’t  Know  Why:  Effects  of  Implicit  Attitudes  toward 
Automation  on  Trust  in  an  Automated  System.”  Human  Factors  55  (3):  520-534. 

Merritt,  S.  M.,  and  D.  R.  Ilgen.  2008.  “Not  All  Trust  Is  Created  Equal:  Dispositional  and  History-Based  Trust  in  Human  Automation 
Interactions.”  Human  Factors  50  (2):  194-210. 

Mosier,  K.  L.,  L.  J.  Skitka,  M.  D.  Burdick,  and  S.  T.  Heers.  1996.  “Automation  Bias,  Accountability,  and  Verification  Behaviours.”  In 
Proceedings  of  the  Human  Factors  and  Ergonomics  Society  Annual  Meeting  40  (4):  204-208. 

Mouloua,  M.,  J.  A.  -A.  Smither,  D.  A.  Vincenzi,  and  L.  Smith.  2002.  “Automation  and  Aging:  Issues  and  Considerations.”  Advances  m 
Human  Performance  and  Cognitive  Engineering  Research  2:  213-237. 

Nass,  C.,  J.  Steuer,  and  E.  R.  Tauber.  1994.  “Computers  Are  Social  Actors.”  In  Proceedings  of  the  SIGCHI  Conference  on  Human  Factors 
in  Computing  Systems:  Celebrating  Interdependence,  72-78. 

Pak,  R.,  N.  Fink,  M.  Price,  B.  Bass,  and  L.  Sturre.  2012.  “Decision  Support  Aids  with  Anthropomorphic  Characteristics  Influence  Trust 
and  Performance  in  Younger  and  Older  Adults.”  Ergonomics  55  (9):  1059-1072. 

Parasuraman,  R.,  and  C.  A.  Miller.  2004.  “Trust  and  Etiquette  in  High-Criticality  Automated  Systems.”  Communications  of  the  ACM 
47  (4):  51-55. 

Perdue,  C.  W.,  and  M.  B.  Gurtman.  1990.  “Evidence  for  the  Automaticity  of  Ageism.”  Journal  of  Experimental  Social  Psychology  26  (3): 
199-216. 

Posthuma,  R.  A.,  and  M.  A.  Campion.  2009.  “Age  Stereotypes  in  the  Workplace:  Common  Stereotypes,  Moderators,  and  Future  Research 
Directions?”  Journal  of  Management  35  (1):  158-188. 

Qiu,  L.,  and  1.  Benbasat.  2009.  “Evaluating  Anthropomorphic  Product  Recommendation  Agents:  A  Social  Relationship  Perspective  to 
Designing  Information  Systems.”  Journal  of  Management  Information  Systems  25  (4):  145-182. 

Raudenbush,  S.  W.,  and  A.  S.  Bryk.  2002.  Hierarchical  Linear  Models.  2nd  ed.  Thousand  Oaks,  CA:  Sage. 

Real  Softtware.  2013.  [Computer  software].  Austin,  TX. 

Rosen,  B.,  and  T.  H.  Jerdee.  1976.  “The  Nature  of  Job-Related  Age  Stereotypes.”  Journal  of  Applied  Psychology  61  (2):  180-183. 

Rossi,  P.  H.,  and  A.  B.  Anderson.  1982.  “The  Eactorial  Survey  Approach:  An  Introduction.”  In  Measuring  Social  Judgments,  edited  by 
P.  H.  Rossi  and  S.  L.  Nock.  Beverly:  Hills  Sage. 

Rovira,  E.,  K.  McGarry,  and  R.  Parasuraman.  2007.  “Effects  of  Imperfect  Automation  on  Decision  Making  in  a  Simulated  Command  and 
Control  Task.”  Human  Eactors  49  (1):  76-87. 

Rottger,  S.,  K.  Bali,  and  D.  Manzey.  2009.  “Impact  of  Automated  Decision  Aids  on  Performance,  Operator  Behaviour  and  Workload  in  a 
Simulated  Supervisory  Control  Task.”  Ergonomics  52  (5):  512-523. 

Shah,  R.,  and  J.  Ogden.  2006.  “What’s  in  a  Face?  The  Role  of  Doctor  Ethnicity,  Age  and  Gender  in  the  Formation  of  Patients’ 
Judgements:  An  Experimental  Study.”  Patient  Education  and  Counseling  60  (2):  136-141. 

Singer,  M.  S.  1986.  “Age  Stereotypes  as  a  Eunction  of  Profession.”  The  Journal  of  Social  Psychology  126  (5):  691-692. 

Singh,  1.  L.,  R.  Molloy,  and  R.  Parasuraman.  1993.  “Individual  Differences  in  Monitoring  Failures  of  Automation.”  The  Journal  of 
General  Psychology  120  (3):  357-373. 

Skitka,  L.,  K.  Mosier,  and  M.  Burdick.  1999.  “Does  Automation  Bias  Decision-Making?”  International  Journal  of  Human-Computer 
Studies  51  (5):  991-1006. 

Tabachnick,  B.  G.,  and  L.  S.  Fidell.  2007.  “Multi-Level  Linear  Modeling.”  In  Using  Multivariate  Statistics.  5th  ed.,  781-857.  Boston, 
MA:  Pearson. 

Wickens,  C.  D.,  and  S.  R.  Dixon.  2007.  “The  Benefits  of  Imperfect  Diagnostic  Automation:  A  Synthesis  of  the  Literature.”  Theoretical 
Issues  in  Ergonomics  Science  8  (3):  201-212. 

Wrenn,  K.  A.,  and  T.  J.  Maurer.  2004.  “Beliefs  about  Older  Workers’  Learning  and  Development  Behavior  in  Relation  to  Beliefs  about 
Malleability  of  Skills,  Age-Related  Decline,  and  Control.”  Journal  of  Applied  Social  Psychology  34  (2):  223-242. 

Yee,  N.,  J.  N.  Bailenson,  and  K.  Rickertsen.  2007.  “A  Meta- Analysis  of  the  Impact  of  the  Inclusion  and  Realism  of  Human-Like  Faces  on 
User  Experiences  in  Interfaces.”  In  Presented  at  the  CHI  07:  Proceedings  of  the  SIGCHI  Conference  on  Human  Eactors  in 
Computing  Systems,  New  York,  NY :  ACM. 

Zhang,  T.,  B.  Zhu,  and  D.  B.  Kaber.  2011.  “Anthropomorphism  and  Social  Robots.”  In  Human-Computer  Etiquette:  Cultural 
Expectations  and  the  Design  Implications  They  Place  on  Computers  and  Technology,  edited  by  C.  C.  Hayes  and  C.  A.  Miller, 
231-255.  Boca  Raton:  Auerbach. 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Downloaded  by  [Clemson  University]  at  06:30  20  June  2014 


Ergonomics 


13 


Appendix.  Multi-level  model 


Model  2: 
Level  1: 

Level  2: 


Model  3: 
Level  1: 

Level  2; 


TRUSTi,  =  Poit  -|-(3ii,(AgntGndr)  +  PjitCAgntAge)  +  P3i,(Reliab)  +  P4i,(AgntGndr*AgntAge) 
+  P5it(AgntGndr*Reliab)  +  P6it(AgntAge*Reliab)  +  P7it(AgntGndr*AgntAge*Reliab)  +  rj, 
Poi  =  "too  +  Uoi 
Pii  =  Vio 

Pzi  =  720 
Psi  =  730 
Pri  =  740 

Psi  =  Vso 

P6i  =  IbO 

Pti  =  ho 

TRUSTit  =  Poit  -l-Pii,(AgntGndr)  +  PzuCAgntAge)  +  P3i,(Reliab)  +  P4i,(AgntGndr*AgntAge) 
+  P5i,(AgntGndr*Reliab)  +  P6i,(AgntAge*Reliab)  +  P7it(AgntGndr*AgntAge*Reliab)  +  Ti, 
Po,  =  Voo  +  7oi(AGE)  +7o2(GENDER)  +7o3(DKS)  +7o3(CPRS)  +  uq, 

Pii  =  Vio 
P2i  =  720 

P  3i  =  ho  +  73i(AGE*GENDER) 

P4i  =  140 

P  Si  =  ho  +  75i(GENDER)  +752(AGE) 

P  6i  =  760  +  76i(AGE) 

Pvi  =  770 
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Abstract: 

Objective:  We  explored  the  extent  to  which  individual  differences  in  cognitive  ability 
affected  the  use  of  types  and  levels  of  automation  support  in  a  complex  decision-making  task. 

Background:  Studies  show  performance  benefits  with  reliable  automation  but 
performance  costs  with  imperfect  automation,  particularly  as  automation  support  increases. 
Cognitive  abilities  are  also  critical  to  decision-making  and  correlate  with  automation  reliance. 

Method:  We  examined  decision-making  perfomiance  with  varying  types  and  levels  of 
imperfect  automation  that  supported  86  participants  performing  a  simulated  command  and 
control  task.  Participants  completed  measures  of  attentional  control  and  spatial  working  memory. 

Results:  Automation  reliability,  support,  and  task  load  interacted  to  affect  accuracy. 
Additionally,  working  memory  ability  interacted  with  reliability  and  automation  support. 

Reliable  automation  with  increased  automation  support  resulted  in  higher  accuracies.  When 
automation  was  imperfect,  the  reverse  was  true:  increased  automation  support  resulted  in  lower 
accuracy,  especially  for  those  with  lower  working  memory  ability.  Those  with  higher  working 
memory  were  less  susceptible  to  the  detrimental  effects  of  increasingly  supportive,  but  imperfect, 
automation.  Further,  lower  working  memory  was  associated  with  more  trust  in  automation. 

Conclusion:  These  results  confirm  the  link  between  automation  performance  and 
individual  differences,  but  also  demonstrate  the  limits  of  the  conventional  wisdom  that  higher, 
reliable  automation  support  unilaterally  helps  performance  while  higher,  imperfect  automation 
support  harms  performance. 

Application:  Optimizing  human  system  performance  requires  understanding  how 
individual  variability  contributes  to  performance  with  automation.  These  results  may  apply  to 
the  design  of  systems  that  accommodate  individual  differences  in  abilities  through  interface 
design  and  personnel  selection. 


Keywords:  human  automation  interaction,  types  and  levels  of  automation,  individual 
differences,  working  memory,  attention,  trust,  task  load,  and  mental  workload 

Precis:  The  extent  to  which  individual  differences  cognitive  ability  affected  the  use  of  imperfect 
types  and  levels  of  automation  in  complex  decision-making  was  investigated.  It  was  found  that 
increased  working  memory  capacity  buffered  the  performance  costs  of  imperfect  decision 
automation  and  enhanced  the  benefit  of  automation  support. 
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INTRODUCTION 

Background 

Commercial  pilots  are  supported  by  sophisticated  technology  in  the  cockpit,  soldiers  are 
supported  with  automated  targeting  systems,  drivers  are  supported  by  rear  cameras  cars,  and 
many  mobile  phones  use  voice  recognition  software  to  assist  users  in  searching  for  information. 
Each  of  these  examples  of  automation  could  be  characterized  along  two  dimensions:  what  stage 
of  information  processing  they  support  (type  of  automation:  information  acquisition,  information 
analysis,  decision-making,  or  action  implementation)  and  how  much  they  support  the  operator 
(level  of  automation:  from  a  low  level  to  a  highly  autonomous  level;  Sheridan  &  Verplank,  1978; 
Parasuraman,  Sheridan,  &  Wickens,  2000). 

A  growing  body  of  research  has  examined  how  human  performance  is  differentially 
affected  by  various  types  and  levels  of  highly  reliable  but  imperfect  automation  (Crocoll  & 
Coury,  1990;  Endsley  &  Kaber,  1999;  Galster,  Bolia,  &  Parasuraman,  2002;  Lorenz,  Di  Nocera, 
Rottger,  &  Parasuraman,  2002;  Sarter  &  Schroeder,  2001;  Wickens  &  Xu,  2002;  Rovira, 
McGarry,  &  Parasuraman,  2007;  Onnasch,  Wickens,  Li,  &  Manzey,  2014).  The  interest  is 
motivated  by  the  severe  negative  human  performance  consequences  of  imperfect  automation 
such  as:  out  of  the  loop  unfamiliarity  (Wickens,  1992),  automation  complacency  (Parasuraman, 
Molloy,  &  Singh,  1993),  loss  of  situation  awareness  (Endsley  &  Kiris,  1995),  and  skill 
degradation  (Bainbridge,  1983). 

In  a  meta  analysis  of  1 8  automation  studies  examining  the  differential  effects  of  types  and 
levels  of  automation,  Onnasch  et  al.  (2014)  found  performance  benefits  with  reliable  automation 
and  performance  decrements  with  higher  types  and  levels  of  automation.  Of  most  interest,  were 
the  decrements  in  performance  found  when  automation  support  moved  across  the  critical 
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boundary  of  information  automation  to  decision  automation.  Thus,  an  important  goal  for 
designers  is  to  mitigate  performance  costs  associated  with  higher  types  and  levels  of  automation 
by  facilitating  appropriate  trust  calibration  (e.g.,  Rovira,  Cross,  Leitch,  &  Bonaceto,  2014).  One 
approach  is  to  better  understand  how  individual  differences  in  cognitive  ability  affect  the 
appropriate  use  of  imperfect  types  and  levels  of  automation  in  complex  decision-making  tasks. 
Individual  Differences 

Some  early  research  has  explored  sources  of  individual  differences  and  performance  with 
automation  (e.g.,  Singh,  Molloy,  &  Parasuraman,  1993).  However,  these  early  investigations 
have  focused  on  what  could  be  considered  personality  characteristics  (e.g.,  complacency 
potential;  Singh  et  ah,  1993).  Another  source  of  individual  differences  may  be  cognitive  abilities 
specifically  working  memory  capacity  (Baddeley,  1986)  and  visuospatial  attention  (Gopher, 
1982).  It  is  also  well  established  that  working  memory  and  attention  are  critical  abilities  that 
underlie  effective  decision-making  (Lohse,  1 997)  and  reliance  on  automation  (Parasuraman  & 
Manzey,  2010).  Therefore,  optimizing  human  system  performance  necessitates  the  assessment 
and  understanding  of  how  individual  cognitive  variability  contributes  to  operational  performance 
and  automation  usage. 

In  one  of  the  earlier  studies  examining  automation  performance  and  individual 
differences  in  cognitive  abilities,  Chen  and  Terrence  (2009)  investigated  the  effects  of  imperfect 
automation  and  individual  differences  in  a  military  multitask  environment.  Specifically  they 
were  interested  if  individual  differences  in  perceived  attentional  control  impacted  how  operators 
interacted  with  miss  vs.  false  alarm  prone  automation.  Attentional  control  was  assessed  using  a 
survey  that  measured  individuals’  perceived  attentional  focus  and  shifting.  They  found  that 
individuals  with  high  perceived  attentional  control  were  more  negatively  affected  by  false 
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alarms,  whereas  for  individuals  with  low  perceived  attentional  control,  miss  prone  automation 
was  more  harmful.  In  the  context  of  their  task  (military  gunner  and  robotics  operator),  perceived 
attentional  control  was  clearly  an  important  moderator  of  how  operators  reacted  to  automation 
false  alarms  and  misses. 

Individual  differences  in  working  memory  also  seem  to  play  a  role  in  mediating  operator 
performance  with  automation.  Parasuraman,  de  Visser,  Lin,  and  Greenwood  (2012)  examined 
whether  certain  genotypes  could  predict  an  individual’s  susceptibility  to  automation  bias;  in 
other  words  operators  adhering  to  imperfect  automation.  Researchers  looked  at  two  specific 
single  nucleotide  polymorphisms  (SNPs)  or  variants  of  the  DBH  gene  that  regulate  Dopamine 
(DA)  and  norepinephrine  (NE).  DA  and  NE  levels  are  associated  with  DBH  enzyme  activity 
(low,  high)  that  contributes  to  neural  activity  in  the  prefrontal  cortex  known  to  play  a  critical  role 
in  working  memory  ability.  Using  a  command  and  control  task  (Rovira,  et  ah,  2007), 
Parasuraman  et  al.  (2012)  varied  the  automation  support  (manual,  reliable,  and  imperfect)  that 
low  and  high  DBH  enzyme  groups  experienced.  They  found  no  difference  between  the  low  and 
high  DBH  enzyme  groups  with  manual  and  reliable  automation,  but  with  imperfect  automation 
individuals  in  the  low  DBH  enzyme  group  performed  better  compared  to  individuals  in  the  high 
DBH  enzyme  group.  Parasuraman  et  al.  (2012)  attributed  this  effect  to  individual  differences  in 
working  memory  induced  by  enhanced  DA  availability  in  the  low  DBH  enzyme  group. 

However,  because  they  did  not  measure  working  memory  or  other  cognitive  abilities,  it  is  still 
unclear  if  individual  differences  in  working  memory  interact  with  automation  reliability  to  affect 
performance. 

The  importance  of  individual  differences  in  working  memory  was  examined  in  another 
study  (de  Visser,  Shaw,  Mohamed-Ameen,  &  Parasuraman,  2010).  Researchers  investigated  the 
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role  of  working  memory  in  an  automated  UAV  task  by  varying  task  load  (low,  high)  and 
automation  reliability  (manual,  reliable,  and  imperfect).  Participants  completed  both  the 
Operation  Span  (OPSAN)  and  Spatial  Span  (SSPAN)  working  memory  tests  (Engle,  2002). 
Researchers  found  a  significant  correlation  with  OSPAN  scores  and  performance  on  the 
automated  task.  For  each  automation  task  performance  measure,  they  found  that  linear  models 
that  included  working  memory  accounted  for  more  of  the  variance  in  performance  as  compared 
to  the  linear  models  without  the  individual  differences  OSPAN  measure.  Thus,  when  individual 
differences  in  working  memory  are  accounted  for,  more  variation  in  performance  with 
automation  can  be  explained.  However,  these  researchers  did  not  investigate  types  and  levels  of 
automation. 

Research  Hypotheses 

This  research  was  aimed  at  understanding  the  sources  of  performance  differences 
underlying  human-automation  interaction  with  imperfect  automation  across  different  types  and 
levels  of  automation.  Many  studies  have  investigated  this  same  topic  (for  a  review  see  Onnasch 
et  al,  2014),  however  our  work  is  distinct  because  it  examines  individual  differences.  In 
addition,  the  current  research  extended  previous  work  in  this  area  in  two  specific  ways.  First,  we 
explicitly  varied  types  and  levels  of  imperfect  automation  and  task  load.  Secondly,  we  more 
directly  measured  individual  differences  in  cognitive  abilities  by  using  well-accepted  working 
memory  and  visuospatial  attention  tasks  compared  to  self-reported  measures  of  abilities, 
complex  proxy  tasks  (e.g.,  video  game  performance),  or  genetic  predictors  of  cognitive 
performance.  Finally,  while  evidence  from  a  review  of  20  automation  reliability  studies 
suggested  that  dependence  on  imperfect  automation  would  be  stronger  with  increased  task 
demand  (because  the  operator’s  limited  resources  are  expended;  Wickens  &  Dixon,  2005)  this  is 
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the  first  study  that  investigated  the  effects  of  individual  differences  in  working  memory  and 
visuospatial  attention  on  types  and  levels  of  imperfect  automation  and  varying  task  demand. 

We  hypothesized  that  individual  differences  in  working  memory  and  visuospatial 
attention  would  differentially  impact  reliance  on  varying  types  and  levels  of  automation. 
Specifically: 

1.  First,  consistent  with  previous  literature,  we  hypothesized  that: 

a.  operators  would  perform  better  with  reliable  automation  compared  to  manual 
control. 

b.  there  would  be  no  difference  between  task  load  conditions  when  the  automation 
was  reliable. 

c.  the  differential  impact  of  information  and  decision  automation  would  be  evident 
with  imperfect  automation  especially  with  high  task  load. 

2.  Second,  as  suggested  by  Parasuraman  et  al.  (2012)  we  expected  individuals  with  higher 
working  memory  capacity  to  show  less  of  a  decrement  with  higher  forms  of  imperfect 
decision  automation  as  compared  to  individuals  with  less  working  memory  capacity. 
Specifically,  with  imperfect  automation  or  high  task  load  it  was  predicted  that  the 
benefits  of  better  working  memory  capacity  would  be  highlighted. 

3.  Third,  we  expected  individuals  with  high  visuospatial  ability  to  perform  better  with  high 
task  demand  and  information  automation  as  compared  to  individuals  with  lower  levels  of 
visuospatial  ability.  This  would  be  interesting  as  researchers  currently  recommend  lower 
types  and  levels  of  automation  when  1 00%  automation  reliability  cannot  be  guaranteed 
and  return  to  manual  control  is  of  concern  (Onnasch  et  al,  2014),  however  integrating 
large  amounts  of  data  may  be  difficult  for  some  individuals. 
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4.  Finally,  we  expected  a  relationship  between  variations  in  cognitive  ability  and  self-report 
measures  of  trust.  Specifically,  individuals  with  low  cognitive  abilities  would  trust  the 
automation  more  compared  to  individuals  with  high  cognitive  abilities. 

METHODS 

Participants 

A  total  of  86  cadets  (18  women)  from  the  U.S.  Military  Academy  volunteered  and 
participated  in  this  study  for  extra  credit.  Ages  ranged  from  18  to  24  (M=  20.27,  SD  =  1.25). 

Stimuli  and  Task  Procedures 

Participants  completed  this  study  in  two  hours  including  training  and  breaks.  Participants 
first  completed  two  cognitive  measures  followed  by  a  simulated  artillery  sensor-to-shooter 
targeting  task.  Response  time  and  accuracy  were  collected  for  all  measures.  An  anti-saccade  task 
(Unsworth,  Schrock,  &  Engle,  2004)  was  also  administered  to  participants,  but  data  loss 
prevented  analysis  and  so  it  will  not  be  discussed  further. 

Visuospatial  Attention  Task.  A  spatially  cued  letter  discrimination  task  developed  by 
Greenwood  et  al.  (2000)  was  used  to  measure  attentional  control.  First,  a  fixation  point  was 
displayed  for  500  ms  followed  by  a  cue  (an  arrow  pointing  either  left,  right,  or  in  both 
directions).  The  cue  was  either  valid,  predicting  the  subsequent  target  location  on  61.5%  of  the 
trials,  invalid  on  15%,  neutral  on  15%,  or  no  cue  appeared  on  8.5%  of  the  trials.  The  location  cue 
appeared  for  a  cueAarget  SOA  of  500  or  2,000  ms  (Figure  1).  Next,  a  letter  target  appeared  to 
the  right  or  left  of  the  fixation  point.  Participants  categorized  the  target  letter  as  either  a 
consonant  or  a  vowel  by  using  their  index  fingers  to  select  one  of  two  responses  on  a  keyboard. 
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Valid  cue 


Invalid  cue 


+ 


Fixation  (500  ms) 


- 

<-  A 

Invalid  cue  (500  or  2000  ms) 

Target  (displayed  until  response) 

Figure  1.  Visuospatial  attention  measure  used  to  indicate  individual  differences  in  attentional 
control  at  various  SOAs. 


Working  Memory  Task.  A  spatial  working  memory  task  assessed  working  memory 
capacity  (Figure  2;  Greenwood  et  ak,  2005).  A  fixation  cross  appeared  for  500  ms  followed  by 
one,  two,  or  three  black  dots  (1.65°  in  diameter,  each  indicating  a  target  location)  at  random 
screen  locations  for  500  ms.  Simultaneously  with  dot  offset,  the  fixation  cross  reappeared  for  3  s. 
At  the  end  of  the  delay,  a  single  red  test  dot  appeared  on  the  screen.  This  test  dot  appeared  either 
at  the  same  location  as  one  of  the  target  dots  (match  condition)  or  at  a  different  location  (non- 
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match  condition).  On  non-match  trials,  the  distance  between  the  correct  location  and  the  test  dot 
varied  over  three  levels  (~  1.3°,  2°,  or  2.6°  of  visual  angle).  Participants  indicated  whether  the 
test  dot  location  matched  one  of  the  target  dots  using  their  index  fingers  to  select  one  of  two 
responses  on  a  keyboard. 

Because  the  working  memory  task  generated  several  dependent  variables  (performance  at 
different  memory  loads),  a  composite  score  was  created  consisting  of  accuracy  on  trials  at  three 
levels  of  memory  load,  in  both  match  and  non-match  conditions.  Z-scores  were  computed  for 
each  of  the  six  conditions  and  a  mean  was  taken  to  form  a  composite  for  each  individual.  Thus, 
this  composite  score  was  not  standardized,  but  reflected  the  average  of  the  standardized  scores. 


Fixation  (500  ms) 


Target  (500  ms;  ioad  3) 


3  second  delay 


Test  location 


Figure  2.  Working  memory  measure  used  to  indicate  spatial  working  memory  capacity  at  3 
levels  of  load. 

Artillery  Sensor- to-Shooter  Targeting  Task 

A  low-fidelity  software  simulation  of  an  artillery  sensor-to-shooter  targeting  system  was 
used  with  varying  levels  of  automation  support  (Rovira  et  ah,  2007).  The  artillery  task  consisted 
of  three  components  in  separate  windows:  a  terrain  view,  a  task  window,  and  a  communications 
module  (Figure  3).  A  two-dimensional  terrain  view  of  a  simulated  battlefield  displayed  red 
enemy  units  (labeled  El,  E2,  ...  Ex),  yellow  friendly  battalion  units  (Bl,  B2,  and  B3),  green 
friendly  artillery  units  (Al,  A2,  ...  Ax),  and  one  orange  friendly  headquarter  unit  (HQ).  In  the 
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task  window,  users  made  enemy-friendly  engagement  selections.  The  participants  were  required 
to  identify  the  most  dangerous  enemy  target  and  select  a  corresponding  friendly  unit  to  engage  in 
combat  with  the  target,  known  as  an  enemy-friendly  engagement  selection. 


Figure  3.  The  sensor-to-shooter  task  interface,  shown  in  the  low  task  load,  medium-decision 
automation  condition. 


The  bottom  left  of  the  task  window  provided  varying  types  and  levels  of  automation 
support.  The  lowest  support  (above  the  fully  manual  condition)  was  information  automation, 
which  provided  a  list  of  all  possible  engagement  combinations,  including  the  distances  between 
enemy  targets,  friendly  units,  and  headquarters.  Because  no  explicit  suggestion  for  decision- 
selection  was  provided,  this  corresponded  to  information  automation  in  the  Parasuraman  et  al. 
(2000)  taxonomy.  The  next  level  of  automation,  low-decision  automation,  gave  a  list  of  all 
possible  engagement  combinations,  including  the  distances  between  enemy  targets,  friendly 
units,  and  headquarters;  however,  the  listings  were  prioritized  with  the  best  selection  first  and  the 
worst  choice  last,  making  this  a  form  of  decision  automation.  In  the  medium-decision  automation 
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condition,  the  participant  was  provided  the  top  three  options  for  engagement,  including  the 
distances  between  enemy  targets,  friendly  units,  and  headquarters  (Figure  3).  Unless  the  trial 
was  imperfect,  the  first  line  was  always  the  best  enemy-friendly  selection. 

Participants  could  either  follow  the  automation  (select  the  top  pairing  in  the  ordered  list)  or 
make  their  own  enemy-friendly  unit  engagement  selection,  but  were  required  to  make  a  decision 
within  10  s.  Participants  were  able  to  cross-verify  the  automation  by  reviewing  the  terrain  view. 
After  they  made  their  selection,  or  if  1 0  s  had  elapsed,  the  trial  ended  and  the  terrain  map  was 
replaced  with  a  new  grid  of  enemy,  friendly,  and  HQ  units. 

To  increase  the  overall  difficulty  of  completing  the  sensor-to-shooter  task,  a  random  call 
sign  appeared  every  6  s  and  remained  displayed  until  the  next  call  sign.  Participants  were 
required  to  click  on  the  ANSWER  button  every  time  their  personal  call  sign  appeared  while  they 
were  selecting  units.  Their  call  sign  occurred  randomly  every  50  and  90  seconds. 

Experimental  Design 

A  4  (Automation  Support:  manual,  information  automation,  low-decision  automation, 
medium-decision  automation)  x  2  (Task  Load:  low,  high)  x  2  (Trial  Reliability:  reliable, 
imperfect)  within-subjects  design  was  used.  Task  load  was  manipulated  by  increasing  the 
number  of  friendly  and  enemy  units  from  three  to  six  each.  Trial  reliability  was  manipulated  for 
each  of  the  automation  support  conditions  and  referred  to  a  correct  automated  assessment 
(reliable)  versus  an  incorrect  automated  assessment  (imperfect).  Participants  were  informed  that 
although  the  automation  was  highly  reliable,  it  was  not  1 00%  reliable  (actual  overall  reliability 
was  80%).  However,  no  further  information  on  reliability  was  given. 

Each  participant  practiced  with  each  of  the  eight  conditions:  manual  at  (1)  low  task  load 
and  (2)  high  task  load;  information  automation  at  (3)  low  task  load  and  (4)  high  task  load;  low- 
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decision  automation  at  (5)  low  task  load  (6)  high  task  load;  and  medium-decision  automation  at 
(7)  low  task  load  and  (8)  high  task  load.  During  practice,  participants  completed  trials  at  both 
task  load  conditions  for  each  level  of  automation  support  tool  before  a  new  level  of  automation 
support  was  introduced.  After  completing  the  practice  trials,  participants  completed  8  blocks 
with  each  block  representing  a  particular  combination  of  task  load  (low,  high;  counterbalanced) 
or  automation  support  (manual,  information  automation,  low-decision,  medium-decision; 
counterbalanced  via  partial  Latin  square).  Each  block  consisted  of  40  trials.  In  all,  each 
participant  completed  320  test  trials. 

Dependent  variables  included  the  accuracy  and  speed  of  enemy-friendly  engagement 
selections.  Accuracy  was  calculated  by  the  percentage  of  trials  in  which  the  participant  correctly 
selected  the  most  dangerous  enemy  target  and  a  corresponding  friendly  unit  to  engage  in  combat. 
Secondary  task  measures  of  performance  included  accuracy  on  the  communications  (call  sign) 
task.  To  obtain  subjective  measures  of  mental  workload,  participants  completed  a  computerized 
version  of  NASA-Task  Load  Index  (TLX)  after  each  block  (Hart  &  Staveland,  1988). 

Participants  also  rated  their  tmst  in  automation  after  each  automation-present  block  (history- 
based  trust)  using  an  on-screen  visual  analog  scale  ranging  from  0  to  100  (adapted  from  Lee  & 
Moray,  1994)  and  at  the  completion  of  the  study  (dispositional  trust;  adapted  from  Jian,  Bisantz, 
&  Drury,  2000). 

RESULTS 

Repeated  measures  analyses  of  variance  (ANOVAs)  were  conducted  to  evaluate  effects 
of  automation  support,  task  load,  and  trial  reliability  on  performance,  subjective  mental 
workload,  and  tmst.  Multilevel  linear  models  were  conducted  to  measure  the  role  of  individual 
differences  in  cognitive  ability  on  task  performance  under  the  various  manipulations. 
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Manual  Control  versus  Automation  Support 

Decision-making  accuracy  was  computed  under  manual  control  and  automation  support. 
For  these  analyses,  we  collapsed  across  the  three  forms  of  automation  support  (information,  low- 
decision,  and  medium-decision)  and  then  segregated  by  trial  reliability  (reliable,  imperfect). 
Figure  4  shows  that  performance  with  reliable  automation  improved  compared  to  manual  and 
degraded  with  imperfect  automation  and  high  task  load.  A  3  (automation  support:  manual, 
reliable  automation,  imperfect  automation)  x  2(task  load:  low,  high)  repeated  measures  ANOVA 
revealed  a  main  effect  of  automation  type,  F(l,84)=272. ?,/><. 05,  r[p  =  .76,  task  load 
F(1,85)=82.0,j9<.05,  p/  =  .49,  and  the  interaction  between  the  two,  F(l,84)=51.9,/><.05,  r[p  = 
.38. 


U 

c. 


c 


High  lu.sk  load 


Task  load 

Figure  4.  Decision  making  accuracy  as  a  function  of  task  load  and  automation  support.  Bars 
indicate  standard  error. 


Pairwise  comparisons  showed  the  source  of  the  interaction  was  due  to  performance 
decrements  with  increased  task  load  with  manual  (low  task  load  M=.15,  57)=.  14;  high  task  load 
M=.61,  5D=.16)  and  reliable  automation  (low  task  load  M=.88,  SD=.01;  high  task  load  M=J9, 
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5Z)=.10),  but  not  with  imperfect  automation  (p<.05).  Due  to  space  limits  RT  data  was  not 
included  though  it  varied  similarly  to  accuracy  data  and  did  not  demonstrate  a  speed  accuracy 
trade-off 

Multilevel  Models 

A  two-level  hierarchical  model  assessed  the  effects  of  the  within-person  variables  of 
automation  support,  task  load,  automation  reliability,  the  between-person  predictor  of  working 
memory  score,  and  their  interactions  on  decision-making  accuracy  in  the  sensor-to-shooter  task. 

It  was  expected  that  decision-making  accuracy  would  be  related  to  automation  support,  task  load, 
reliability,  and  individual  differences  in  working  memory  ability.  Automation  support  was 
included  as  an  interval-level  variable  in  the  model. 

Multiple  responses  were  nested  within  the  85  participants  as  each  participant  performed 
the  sensor-to-shooter  task  under  varying  automation  support,  task  load,  and  trial  reliability. 
Accuracy  represented  the  ratio  of  their  correct  to  incorrect  trials  under  each  combination  of  those 
conditions.  These  scores  were  nested  in  the  within-person  manipulations  (automation  support, 
task  load,  reliability),  which  were  in  turn  nested  within  the  attributes  of  the  participant  (working 
memory  ability).  These  nested  observations  were  unlikely  to  be  independent,  violating  the 
independence  of  error  variances  assumption  of  logistic  regression  (e.g.,  responses  by  a 
participant  are  likely  to  be  correlated  to  that  person’s  ability).  MLMs  are  preferred  over 
regression  especially  for  within-subjects  experimental  designs  that  produce  hierarchically 
structured  data  (Raudenbush  &  Bryk,  2002).  Multivariate  regression  ignores  this  hierarchical 
structure,  or  nesting,  which  can  lead  to  inflated  Type  I  error  rates  (Hox  &  Bechger,  1998; 
Tabachnick  &  Fidell,  2007)  while  MLM  allows  each  individual  to  act  as  his  or  her  own  control, 
accounts  for  variability  between  and  within  participants,  and  allow  for  examination  of  cross-level 
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interactions  (Raudenbush  &  Bryk,  2002).  Hoffman  and  Rovine  (2007)  provided  an  accessible 


discussion  of  the  usefulness  of  multilevel  linear  models  in  human  factors  research.  Multilevel 


modeling  was  implemented  using  PROC  MIXED  through  SAS,  version  9.4. 

Table  1.  Unstandardized  Coefficients  of  Multilevel  Models  of  the  Within-  and  Between-person 
Effects  of  Predictors  on  Accuracy  in  a  Sensor-to-Shooter  task 


Model  1 

Model  2 

Model  3 

Fixed  effects 

Unconditional  Model 

Random  Coefficients 

Regression 

Slopes  and  Intercepts 

Estimate 

SE 

Estimate 

SE 

Estimate 

SE 

Intercept 

0.554 

*** 

0.014 

0.346 

0.038 

0.346 

*** 

0.038 

Between-person 

Working  Memory  Composite  Score  (WM) 

0.114 

0.060 

Within-person 

Automation  Support  (AutoSupp) 

-0.029 

0.017 

-0.032 

0.017 

Task  load 

0.146 

** 

0.052 

0.149 

** 

0.051 

Reliability 

0.276 

0.051 

0.274 

*** 

0.051 

Task  load  x  AutoSupp 

-0.081 

*** 

0.024 

-0.080 

*** 

0.024 

Task  load  x  Reliability 

-0.501 

*** 

0.073 

-0.502 

*** 

0.072 

AutoSupp  X  Reliability 

0.154 

0.024 

0.159 

*** 

0.024 

Task  load  x  AutoSupp  x  Reliability 

0.210 

*** 

0.034 

0.209 

*** 

0.033 

Cross-ievel 

Task  load  x  WM 

-0.036 

0.065 

AutoSupp  X  WM 

0.011 

0.026 

Reliabilityx  WM 

0.080 

0.065 

Task  load  x  AutoSupp  x  WM 

0.013 

0.030 

Reliabilityx  AutoSupp  x  WM 

-0.089 

** 

0.030 

Random  Effects 

2 

O 

0.149 

0.007 

0.049 

0.002 

0.047 

0.002 

"^oo 

0.005 

0.003 

0.013 

0.003 

0.011 

0.002 

Model  fit  statistic 

A  1C 

972.2 

-12.6 

-28.6 

Note.  *  p  <  .05,  **  p  <  .01,  ***  p  <  .(X)l;  Working  memory  composite  score  was  grand-mean  centered.  SE  indicates  standard 

error. 

Model  1:  No  predictors.  A  fully  unconditional  model  (Table  1:  Model  1)  was  first  used  to 
discover  the  amount  of  within  and  between  -person  variance  in  accuracy  and  provide  a  baseline 
to  assess  the  fit  of  multivariate  models  (Models  2  and  3).  The  unconditional  model  revealed 
significant  variance  at  both  levels,  with  97%  of  the  variance  at  the  within-person  level  (a  = 
0.149,  z  =  21. 48. 08,  <  .001)  and  3%  of  the  variance  at  the  person  level  (xoo  =  0.005,  z  =  1.81,  ji? 
=  .034). 

Model  2:  Within-person  variables.  Model  2  examined  the  effects  of  the  within-person 
manipulations  on  accuracy  (Table  1).  When  using  this  model  that  included  the  within-subjects 
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manipulated  variables,  67%  of  the  97%  within-subject  variance  was  accounted  for.  Model  fit 
using  the  Akaike's  information  criterion  (AIC)  improved  from  976.2  to  -12.6  (lower  values 
indicate  better  fit). 

The  model  revealed  a  significant  three-way  interaction  of  automation  support,  task  load, 
and  trial  reliability  (Table  1).  Data  were  divided  into  reliable  and  imperfect  trials  to  examine  the 
effects  and  interactions  of  automation  support  and  task  load  and  decompose  the  interaction.  For 
reliable  automation,  pairwise  comparisons  showed  that  increased  task  load  decreased  accuracy 
only  under  information  automation  support  (from  M=.70,  SD=.\9  to  M=A4,  SD=.25; p  <  .05). 
This  can  be  seen  on  the  left  panel  of  Figure  5  where  accuracy  in  the  information  automation 
condition  declined  as  task  load  increased  while  low  and  medium  automation  accuracy  were 
unaffected.  For  trials  with  imperfect  automation,  pairwise  comparisons  showed  that  increasing 
task  load  significantly  decreased  accuracy  only  with  medium-decision  automation  (from  M=  .29, 
SD=. 31  to  M=.16,  SD=. 26;  p  <  .05).  This  decline  in  the  medium-decision  condition  with 
increased  task  load  can  be  seen  on  the  right  panel  of  Figure  5. 


Low  task  load  High  la.'ik  load  Low  la.sk  load  High  ta.sk  load 

Reliable  Impert'eel 


Figure  5.  Decision  accuracy  as  a  function  of  trial  reliability,  task  load,  and  automation  support. 
Bars  indicate  standard  error 
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In  sum,  the  source  of  the  3-way  interaction  of  reliability,  automation  condition,  and  task 
load  appears  to  be:  First,  there  was  a  large  effect  of  reliability,  where  imperfect  automation 
harmed  performance.  Further,  imperfect  automation  harmed  performance  more  with  higher 
automation  support  while  reliable  automation  improved  performance  with  increasing  automation 
support.  Last,  these  effects  were  exacerbated  by  task  load,  which  had  less  of  an  effect  on 
performance  when  automation  was  reliable  and  the  most  effect  on  performance  when  automation 
was  unreliable  and  at  increased  automation  support. 

Model  3:  Cross-level  interactions.  We  expected  individuals  with  higher  working  memory 
capacity  to  show  less  of  a  decrement  with  higher  forms  of  imperfect  decision  automation  as 
compared  to  individuals  with  less  working  memory  capacity.  Specifically,  with  imperfect 
automation  or  high  task  load  it  was  predicted  that  the  benefits  of  better  working  memory 
capacity  would  be  highlighted  (equation  available  in  appendix).  A  third  model  was  conducted  to 
include  working  memory  ability  to  examine  these  hypothesized  cross-level  interactions. 

Attentional  ability  was  not  included  in  the  model  as  it  showed  no  correlation  with  accuracy  (r  =  - 
.07  !,;?>. 05). 

The  model  revealed  a  3 -way  cross-level  interaction  of  reliability,  automation  support,  and 
working  memory  ability  (Table  1).  Model  fit  using  AIC  improved  from  -12.6  to  -28.6,  indicating 
the  benefit  of  considering  individual  differences  in  working  memory  on  accuracy  with 
automation.  To  decompose  the  interaction,  data  were  divided  into  reliable  and  imperfect  trials  to 
examine  the  effects  and  interactions  of  automation  support  and  working  memory  (Figure  6). 

Task  load  was  controlled  for  in  these  models. 
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Figure  6.  Proportion  of  correct  responses  across  automation  support.  Points  were  plotted  by 
calculating  estimates  based  on  low  (1  SD  below  the  mean)  and  high  (1  SD  above  the  mean) 
values  for  the  composite  working  memory  measure.  Note  that  both  groups  had  high  accuracy 
overall  with  more  reliable  automation. 

When  automation  was  reliable,  simple  slopes  analyses  revealed  that  low  and  high 
working  memory  participants  differed  significantly  from  each  other  with  varying  automation 
support  (t(l,410)=-8.01,/?<.001  and  t(l,410)=-12.87,/?<.001).  Significance  contrasts  also 
revealed  that  accuracy  differed  between  low  and  high  working  memory  participants  for 
information  automation  support  (t(l,410)=  -1 1.6,  p<.001)  and  the  predicted  data  points  for  high 
decision  support  (e.g.,  beyond  medium-decision  support)  (t(l,410)=-17.57,  p<.001). 

When  automation  was  imperfect,  those  with  high  working  memory  were  able  to 
maintain  some  level  of  performance  (-52%  higher  than  low  working  memory  participants), 
although  just  as  those  with  lower  working  memory,  their  accuracy  declined  as  automation 
support  increased.  A  simple  slopes  analysis  revealed  that  low  and  high  working  memory 
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participants  differed  significantly  from  each  other  at  all  types  and  levels  of  automation  support 
(f>s<.001)  and  accuracy  declined  as  automation  support  increased  (ps  <  .01). 

In  sum,  working  memory  interacted  with  reliability  and  automation  support  to  affect 
accuracy.  When  automation  was  reliable,  increasing  automation  support  resulted  in  higher 
accuracy  for  all  participants.  When  automation  was  imperfect,  the  reverse  was  true:  increasing 
automation  support  resulted  in  worse  accuracy.  This  was  especially  true  for  those  with  lower 
working  memory  ability.  These  results  show  that  those  with  higher  working  memory  were  less 
susceptible  to  the  detrimental  effects  of  increasingly  supportive  but  imperfect  automation. 

Trust 

History-based  trust  (after  every  block).  The  multivariate  main  effects  of  automation 

2 

support  and  task  load  were  significant,  Wilks’  lambda  =  .37,  F(8,66)=13.9,/><.05,  r[p  =  .63, 
Wilks’  lambda  =  .82,  F(4,70)=3.81,/><.05,  p/  =  .18.  The  interaction  of  task  load  and 
automation  support  was  significant,  Wilks’  lambda  =  .64,  66)=A. 56, p<.0 5,  r[p  =  .36. 

Follow-up  pairwise  tests  showed  that  the  source  of  the  interaction  was  a  significant  decrease  in 
self-reported  reliance  (question  2)  and  decrease  in  the  belief  that  automation  improved 
performance  (question  4)  when  task  load  increased  but  only  in  the  information  automation  and 
low-decision  automation  conditions. 

Correlations  between  tmst  measures  and  abilities  were  computed  to  examine  the  effects 
of  individual  differences  in  working  memory  and  attention  on  trust.  In  the  information 
automation  condition,  having  better  attentional  control  (lower  attention  costs)  was  associated 
with  more  positive  beliefs  about  automation  (tmst,  reliance;  r  =  -.16,  r  =  -.13,  respectively,  all 
j9S<.05).  However,  in  the  low  decision  automation,  attention  was  no  longer  correlated  but 
working  memory  was  negatively  correlated  to  tmst  such  that  working  memory  (higher  WM 
scores)  was  associated  with  lower  tmst,  reliance,  and  beliefs  that  automation  improved 
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performance  (r  =  -.14,  r  =  -.23,  and  r  =  -.13,  respectively,  all  j9S<.05).  Finally,  in  the  medium- 
decision  condition,  working  memory  negatively  correlated  to  reliance  showing  that  higher 
working  memory  was  associated  with  less  self-reported  reliance  on  automation  (r  =  -.14). 

Dispositional  trust.  Tmst  negatively  correlated  with  working  memory  (lower  working 
memory  scores  was  associated  with  more  agreement  with  positive  statements  about  automation; 
r=-.22,/><.05)  while  distrust  positively  correlated  with  working  memory  (higher  working 
memory  was  associated  with  more  agreement  with  negative  statements  about  automation; 
r=.24).  Attention  was  not  correlated  with  positive  or  negative  statements  about  automation. 
Subjective  Ratings  of  Mental  Workload 

Lower  automation  support  resulted  in  higher  mental  workload  and  increased  mental 
workload  at  high  task  load,  but  no  differences  with  the  highest  form  of  automation  support 
(Figure  7).  A  4(automation  support:  manual,  information,  low-decision,  medium-decision)  x  2 
(task  load:  low,  high)  repeated  measures  ANOVA  revealed  a  main  effect  of  automation  support, 
F(3,219)=61.7,j9<.05,  p/  =  .46,  task  load,  F(l,73)=44.1,j9<.05,  p/  =  .38,  and  the  interaction 
between  automation  support  and  task  load,  F(3,219)=6.7,  j9<.05,  p^  =  .08.  Pairwise 
comparisons  showed  that  the  source  of  the  interaction  was  an  effect  of  task  load  on  perceived 
workload  (higher  task  load  resulted  in  higher  perceived  workload)  for  manual,  F(l,73)=25.7, 
p<.Q5,  r\p  =26,  information  automation, F(l,73)=33.7, j9<.05,  p^  =.32,  and  low-decision 
automation,  F(l,73)=4.3,j9<.05,  p^  =.06,  but  not  with  medium-decision  automation. 
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Figure  7.  Mental  workload  as  a  function  of  automation  support  and  task  load. 

DISCUSSION 

The  extent  to  which  automation  enhances  decision-making  depends  on  individual 
differences  in  cognitive  ability.  Using  a  simulated  automated  targeting  task,  we  showed  that  the 
extent  to  which  an  operator  experienced  both  the  costs  of  imperfect  automation  and  the  benefits 
of  reliable  automation  depended  on  individual  differences  in  working  memory.  This  finding  may 
help  optimize  human-automation  interaction. 

Our  study  replicated  prior  research  that  operators  would  perform  better  with  reliable 
automation  compared  to  manual  control  (hypothesis  la).  In  addition,  task  load  did  not 
differentiate  performance  when  the  automation  was  reliable  (hypothesis  lb).  Finally,  our  study 
showed  that  with  imperfect  automation,  there  was  no  difference  in  accuracy  with  information 
automation  and  low-decision  automation  between  low  and  high  task  load  but  accuracy  declined 
at  high  task  load  with  medium  automation  (Hypothesis  Ic).  These  results  demonstrate  an 
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interesting  difference  between  lower  automation  (information  and  low-decision)  and  higher 
automation  (medium-decision).  It  appears  that  lower  automation  can  mitigate  some  of  the 
performance  penalty  of  increased  task  load  when  automation  is  imperfect  while  performance 
significantly  declines  with  imperfect  and  higher  automation. 

A  critical  hypothesis  regarded  the  role  of  individual  differences  and  automation 
performance  (hypothesis  2).  The  MLM  showed  cross-level  interaction  between  working 
memory,  reliability,  and  automation  support.  Performance  was  generally  positively  affected  by 
increasing  automation  but  especially  for  those  with  low  working  memory.  Indeed,  with  reliable 
automation  support  above  information  automation,  working  memory  did  not  differentiate 
accuracy.  Low  and  medium-decision  automation  may  have  reduced  the  working  memory 
demands  of  the  task.  Thus,  reliable  and  increased  automation  support  was  especially  beneficial 
for  those  with  lower  working  memory  (with  maximal  differences  by  working  memory  at 
information  automation). 

When  automation  was  imperfect,  those  with  low  and  high  working  memory  showed 
declines  in  accuracy  as  the  type  and  level  of  automation  increased.  However,  those  with  lower 
working  memory  were  more  severely  impacted  by  the  unreliability  than  those  with  higher 
working  memory.  Taken  together,  these  results  confirmed  hypothesis  2  regarding  the  effects  of 
type  and  level  of  automation  and  working  memory.  These  results  also  added  detail  to  the 
conventional  wisdom  that  increasing  automation  benefits  performance  but  can  lead  to 
catastrophic  performance  when  automation  is  imperfect  (i.e.,  the  lumberjack  effect;  Onnasch  et 
ah,  2014).  When  automation  was  reliable,  those  with  higher  working  memory  benefitted  more 
than  those  with  lower  working  memory,  and  when  automation  was  imperfect,  those  with  lower 
working  memory  suffered  more  than  those  with  higher  working  memory.  These  results 
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confirmed  the  link  between  automation  performance  and  individual  differences  in  working 
memory  as  suggested  by  previous  researchers  (Parasuraman  et  ah,  2010,  Parasuraman,  2012), 
but  also  extend  the  literature  by  further  specifying  the  automation  conditions  (type  and  level  of 
automation  support  and  reliability)  under  which  working  memory  affects  performance. 

The  lack  of  any  effect  of  attention  on  performance  was  puzzling  (hypothesis  4).  There 
may  be  several  differences  that  explain  our  disparate  results.  First,  Chen  and  Terrence  (2009) 
used  a  subjective  “perceived  attentional  control”  measure  to  assess  attentional  ability  whereas  we 
used  a  spatial  cueing  task.  Second,  Chen  and  Terrence  manipulated  reliability  by  adjusting  false 
alarms  and  miss  rates  while  our  task  paradigm  did  not  allow  for  false  alarms  or  miss  rates  (the 
automation  was  always  on  in  the  automation-present  conditions).  Third,  the  choice  of  attention 
measure  (a  spatial  cueing  task)  and  resultant  dependent  variable  (attentional  cost  from  median 
reaction  time)  may  have  not  been  a  sensitive  indicator  of  individual  differences  in  attention  in 
our  sample  of  college  students  as  it  was  in  middle-aged  adults  (Greenwood  et  ah,  2005).  Further, 
although  multiple  targets  needed  to  be  kept  in  memory  during  the  sensor-to-shooter  task,  there 
were  no  distractor  targets  on  the  screen,  meaning  the  task  did  not  require  high  levels  of 
attentional  control. 

More  broadly,  these  results  should  be  put  into  context  with  some  other  possible 
limitations  that  may  affect  the  generalizability  of  the  results.  First,  though  college  students  are 
typical  participants,  our  sample  was  from  a  military  academy  that  possibly  made  them  less 
representative.  Although,  the  automated  task  was  a  simulated  command  and  control  task  and  the 
participants  had  completed  Army  basic  training. 

Practical  Implications 
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Knowing  how  operators  will  perform  with  reliable,  but  imperfect  types  and  levels  of 
automation  at  different  task  loads  is  enhanced  if  we  understand  the  impacts  of  individual 
differences  in  working  memory  and  attention  on  human  automation  interaction.  One  way  this 
knowledge  may  be  useful  is  in  automated  systems  that  alter  the  types  and  levels  of  automation 
support  based  on  the  operators  working  memory  ability,  so  called  adaptive  automation.  Both 
working  memory  and  attentional  capacity  may  change  as  a  function  of  the  current  task  load.  Our 
results  provide  some  information  that  suggests  how  different  levels  of  working  memory  and 
attention  affect  performance.  These  results  also  provide  some  guidance  in  the  design  of  new 
automated  systems.  These  results  showed  that  the  level  of  working  memory  demand  varies  as  a 
function  of  the  type  and  level  of  automation  and  automation  reliability.  Finally,  our  results 
suggest  that  designers  should  design  interfaces  that  support  individuals  by  matching  their 
working  memory  abilities. 


Key  Points 

•  It  was  hypothesized  that  individual  differences  in  working  memory  and  attention  would 
affect  human  automation  interaction  with  varying  types  and  levels  of  imperfect 
automation  or  high  task  load  in  a  simulated  command  and  control  task. 

•  Participants  performed  a  simulated  command  and  control  task  with  manual,  information 
automation,  low-decision  automation,  or  high  decision  automation  differing  in  two  levels 
of  task  load:  low  or  high.  Participants  also  completed  a  spatial  working  memory  task  and 
a  visuospatial  attention  task. 

•  Increased,  reliable  automation  support  reduced  the  differences  between  those  with  low 
and  high  working  memory  abilities.  Higher  working  memory  ability  buffered  the  costs  of 
imperfect  decision  automation.  Lower  working  memory  was  associated  with  more  trust  in 
automation. 

•  Designers  may  mitigate  some  of  the  performance  decrements  experienced  with  imperfect 
automation  by  designing  interfaces  that  support  individual  differences  in  working 
memory  and  attention. 
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APPENDIX 
Equation  for  Model  1 
Level  1 :  Accuracyit 

Level  2: 

Equation  for  Model  2 
Level  1 :  Accuracyit 


Level  2: 


Equation  for  Model  3 
Level  1 :  Accuracyit 


Level  2: 
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Poit  +  fit 


Poi  -  Yoo  +  uoi 


Poit  +  Piit(Task  load)  +  P2it(AutoSupport)  +  P3it(Reliab)  +  P4it(Task 
load*AutoSupport)  +  P5it(AutoSupport*Reliab)  +  P6it(Reliab*Task 
load)  +  P7it(AutoSupport*Reliab*Task  load)  +  pt 

Poi  =  Too  +  uoi 
Pli=  Yio 
Pli  =  720 

P3i=  730 
P4i  =  740 
P5i=  750 

Pei  =  760 
P?!  =  770 


Poit  +  Piit(Task  load)  +  P2it(AutoSupport)  +  P3it(Reliab)  +  P4it(Task 
load*AutoSupport)  +  P5it(AutoSupport*Reliab)  +  P6it(Reliab*Task 
load)  +  P7it(AutoSupport*Reliab*Task  load)  +  pt 

Poi  =  Too  +  7oi(WM)  +  uoi 
Pii  =  7io  +  7ii(WM) 

P2i=  720+  72i(WM) 

P3i=  730  +  73i(WM) 

P4i=  740  +  74i(WM) 

P5i=  750  +  75l(WM) 

Pei  =  760 

P7i  =  770 
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The  Effects  of  Age  and  Working  Memory  Demands  on  Automation-Induced 

Complacency 

William  Leidheiser  &  Richard  Pak 
Clemson  University 
Department  of  Psychology 

Complacency  refers  to  a  type  of  automation  use  expressed  as  insufficient  monitoring  and  verification  of 
automated  functions.  Previous  studies  have  attempted  to  identify  the  age-related  factors  that  influence 
complacency  during  interaction  with  automation.  However,  little  is  known  about  the  role  of  age-related 
differences  in  working  memory  capacity  and  its  connection  to  complacent  behaviors.  The  current  study 
aims  to  examine  whether  working  memory  demand  of  an  automated  task  and  age-related  differences  in 
cognitive  ability  influence  complacency.  Higher  degrees  of  automation  (DOA)  have  been  shown  to  reduce 
cognitive  workload  and  may  be  used  to  manipulate  working  memory  demand  of  a  task.  Thus,  we 
hypothesize  that  a  lower  DOA  (i.e.  information  acquisition  stage  with  lower  level)  will  demand  more 
working  memory  than  a  higher  DOA  (i.e.  decision  selection  stage  with  higher  level)  and  that  a  lower  DOA 
will  result  in  a  greater  difference  in  complacency  between  age  groups  than  a  higher  DOA. 


INTRODUCTION 

The  World  Health  Organization  (WHO,  2011)  estimates 
that  by  2050,  there  will  be  approximately  1.5  billion  elderly 
(age  65  and  over)  in  the  world.  A  host  of  automated  services 
and  devices  are  or  will  be  designed  to  help  older  adults 
maintain  independence  (e.g.,  medication  reminder  apps). 
Despite  this  availability  of  automation  and  its  seemingly 
utility  to  maintain  independent  living  (Haigh  &  Yanco,  2002), 
research  has  shown  that  older  adults  may  be  more  complacent 
with  automated  systems  compared  to  younger  age  groups  (so 
called  automation-induced  complacency). 

Automation-induced  complacency  is  the  “self-satisfaction 
that  may  result  in  non-vigilance  based  on  an  unjustified 
assumption  of  satisfactory  system  state”  (Billings,  Lauber, 
Funkhouser,  Lyman,  &  Huff,  1976).  It  is  the  state  in  which  a 
user  fails  to  notice  imperfect  automation.  The  fault  is  not 
detected  because  the  user  is  poorly  monitoring  the  system, 
which  can  result  in  acceptable  performance  with  reliable 
automation  or  diminished  performance  with  unreliable 
automation  (Parasuraman  &  Manzey,  2010).  For  instance,  an 
older  adult  with  diabetes  may  monitor  their  blood  glucose 
levels  with  an  automated  tool.  If  the  older  adult  perceives  the 
device  as  reliable  and  trusts  that  the  blood  glucose  readings 
are  accurate,  they  may  rely  on  the  reading  even  when  starts  to 
falter.  As  older  adults  begin  to  adopt  automated  technologies, 
it  is  important  to  understand  the  age-related  factors  that 
contribute  to  increased  complacency  and  the  performance 
costs  associated  with  those  behaviors. 

Older  Adults,  Working  Memory,  and  Complaeency 

Older  adults  have  been  found  to  be  more  complacent  with 
automation  relative  to  younger  adults  (Ho,  Wheatley,  & 
Scialfa,  2005b).  Various  studies  have  suggested  several 
possible  explanations  for  older  adults  increased  complacency. 
Some  person-related  variables  range  from  issues  such  as 
higher  levels  of  trust  (Johnson,  Sanchez,  Fisk,  &  Rogers, 

2004;  Pak,  Fink,  Price,  Bass,  &  Sturre,  2012),  or  age-related 
differences  in  abilities  (e.g.,  working  memory;  Ho  et  ah. 


2005b)  while  some  system-related  variables  are  reliability  of 
the  automation  (Sanchez  et  ah,  2004)  and  workload  (McBride, 
Rogers,  &  Fisk,  201 1). 

Research  investigating  age  differences  in  cognitive  ability 
as  a  possible  explanation  for  changes  complacency  has  found 
that  in  a  high  working  memory  demanding  automated  task, 
older  adults  relied  more  on  the  automation,  committed  more 
errors,  had  greater  trust  in  the  system,  and  were  less  confident 
in  their  own  abilities  compared  to  younger  adults  (Ho  et  ah, 
2005b).  Based  on  their  findings,  they  concluded  that  age- 
related  differences  in  working  memory  might  be  a  potential 
reason  for  age  differences  in  complacency  due  to  the  memory 
dependent  automated  task.  For  instance,  the  younger  adults 
were  able  to  hold  more  information  about  the  task  in  their 
working  memory  (Ho  et  ah,  2005b).  Since  they  could  actively 
store  and  recall  this  information  when  needed,  younger  adults 
could  more  easily  identify  an  automation  failure  compared  to 
older  adults. 

Researchers  theorized  there  are  two  main  factors  that 
contribute  to  older  adults’  complacent  behavior  with 
automated  technologies  (Ho,  Kiff,  Plocher,  &  Haigh,  2005a). 
The  first  is  that  while  using  automation,  older  adults  form  an 
inaccurate  mental  representation  of  the  correct  values  used  in 
the  decision  making  process  due  to  reduced  working  memory 
capacity.  The  second  is  that  due  to  their  reduced  working 
memory  capacity,  older  adults  are  unable  to  judge  the 
accuracy  of  automation.  In  both  cases,  it  is  assumed  older 
adults’  relative  complacency  with  automation  is  due  to  a 
mismatch  between  the  working  memory  demands  of  the  task 
and  working  memory  capacity  of  the  person  (Ho  et  ah,  2005a). 
If  working  memory  capacity  plays  such  a  central  role  in 
automation  complacency,  we  should  observe  the  opposite 
relationship  as  well:  reduced  complacency  in  older  adults 
when  the  automation  has  been  designed  to  demand  relatively 
less  working  memory  resources  (or  working  memory 
resources  are  less  constrained).  The  design  of  Ho  et  al.’s 
(2005b)  study  precludes  this  determination  because  it  is 
unclear  whether  the  high  working  memory  demands  of  the 
task  or  the  degree  of  automation  (DOA)  contributed  to  the 
difference  in  complacency. 
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How  Complacency  is  Influenced  by  Automation-Related 
Factors 

Reliability.  Automation  reliability  is  the  overall  accuracy 
of  the  system  and  is  an  important  factor  of  automation-induced 
complacency  because  the  number  of  errors  it  produces  can 
impact  dependence  on  automation. 

Across  different  levels  of  reliability,  age  is  known  to 
produce  increased  effects  on  trust  in  automation.  For  instance, 
several  studies  found  that  higher  reliability  led  to  higher 
subjective  trust  in  the  system  for  both  age  groups,  but  older 
adults  had  significantly  higher  trust  than  younger  adults 
(Sanchez  et  al.,  2004;  Ho  et  al.,  2005b).  Highly  reliable 
automation  is  problematic  because  users  can  become 
accustomed  to  its  high  level  of  performance  and  may  not 
expect  it  to  fail. 

Research  on  age  differences  in  automation  use  has  found 
that  older  adults  tend  to  overestimate  the  actual  automation 
reliability  (Olson  et  al.,  2009).  With  known  differences  in 
working  memory,  older  adults  have  difficulty  detecting  errors 
and  perceiving  overall  automation  performance.  A 
combination  of  unnecessarily  high  trust  in  the  system  and  a 
lack  of  working  memory  may  produce  a  lack  of  error  prone 
awareness  consistent  with  complacent  behavior. 

Workload.  The  workload  or  demand  of  a  task  can  be 
taxing  on  an  individual’s  cognitive  resources,  especially  when 
a  task  is  performed  over  a  long  period  of  time.  Greater 
complacency  has  been  shown  in  a  multitask  environment 
instead  of  a  single  task  or  monitoring  role  for  younger  adults 
(Parasuraman,  Molloy,  &  Singh,  1993). 

Older  adults  have  a  greater  tendency  to  monitor 
automation  and  verify  the  accuracy  of  the  information,  even 
under  taxing  conditions  (Ho  et  al.,  2005b).  Exerting  more 
cognitive  resources  to  complete  a  task  may  lead  the  user  to 
rely  on  automation  after  task  demands  become  too  difficult  to 
manage.  There  are  also  age  differences  in  complacency  that 
have  occurred  under  equivalent  high  workload  conditions, 
where  older  adults  display  greater  complacency  than  younger 
adults  (Hardy,  Mouloua,  Dwivedi,  &  Parasuraman,  1995; 
Vincenzi,  Muldoon,  Mouloua,  Parasuraman,  &  Molloy,  1996, 
Ho  et  al.,  2005b).  If  workload  only  partially  contributes  to 
increases  in  complacency,  other  age-related  factors  must  be 
involved  as  well. 

Working  memory  capacity  has  been  found  to  significantly 
predict  younger  adult  performance  in  an  automated  task  with 
varying  workload  (de  Visser,  Shaw,  Mohamed-Ameen,  & 
Parasuraman,  2010).  Since  working  memory  plays  a  role  in 
predicting  performance,  this  cognitive  ability  may  explain 
some  age-related  differences  in  complacent  behaviors. 

Degree  of  Automation.  Automation  comes  in  a  variety  of 
forms,  which  can  execute  different  functions  for  the  user 
based  on  their  capabilities  and  limitations.  However, 
automation  is  not  simply  an  all  or  none  concept  because  any 
individual  task  can  feature  varying  degrees  of  automation  that 
take  into  account  the  use  of  various  stages  and  levels 
(Wickens,  Li,  Santamaria,  Sebok,  &  Sarter,  2010). 

Parasuraman,  Sheridan,  and  Wickens  (2000)  identified 
several  stages  of  automation  that  are  based  on  an  existing 


model  of  human  information  processing:  information 
acquisition  (stage  1),  information  analysis  (stage  2),  decision 
and  action  selection  (stage  3),  and  action  implementation 
(stage  4).  Each  stage  is  designed  to  support  a  different  aspect 
of  the  cognitive  process. 

Levels  of  automation  differ  from  stages  because  they 
affect  the  role  of  humans  and  automated  systems  in  a  given 
task.  These  levels  exist  on  a  spectrum  of  automation,  where 
each  level  between  manual  and  fully  automated  changes  the 
designation  of  authority  for  decision-making  tasks.  A  low 
level  of  automation  grants  authority  to  the  human,  making  the 
individual  an  active  participant  in  the  task  and  giving  the 
system  a  secondary  role  of  the  passive  monitor.  These  roles 
are  reversed  under  a  high  level  of  automation. 

Along  each  stage  of  automation,  varying  levels  can  be 
applied  to  achieve  a  lower  or  higher  DOA.  More  automation 
or  a  greater  DOA  can  be  achieved  with  both  higher  levels 
within  a  stage  and  later  stages  (Manzey,  Reichenbach,  & 
Onnasch,  2012).  Also,  higher  DO  As  are  associated  with 
greater  performance  in  addition  to  diminished  workload 
(Wickens  et  al.,  2010).  Since  workload  is  reduced  under  a 
higher  DOA,  the  automation  is  taking  on  more  of  the  cognitive 
demand  for  those  tasks  than  the  operator.  This  leaves  the 
operator  with  more  cognitive  resources  at  higher  DO  As.  Thus, 
working  memory  demands  should  lessen  as  the  user  moves 
from  a  lower  DOA  towards  a  higher  DOA. 

Higher  complacency  can  take  the  form  of  performance 
detriments  under  unreliable  systems  and  performance  gains  for 
increasingly  reliable  automation.  For  instance,  a  meta-analysis 
found  that  higher  DO  As  lead  to  greater  accuracy  for  younger 
adults,  but  only  when  the  automation  performed  optimally 
(Onnasch,  Wickens,  &  Manzey,  2013).  However,  there  was  a 
greater  performance  cost  for  imperfect  automation  as  DOA 
increased.  For  younger  adults,  these  findings  reveal 
differences  in  performance  across  DO  As,  which  seem  to 
indicate  changes  in  complacent  behavior.  In  this  context  of 
comparing  performance  across  lower  and  higher  DOAs, 
research  on  the  older  adult  population  has  not  been  performed. 
In  terms  of  research  by  Ho  et  al.  (2005b),  it  is  still  unclear 
whether  the  high  working  memory  demands  of  the  task  or  the 
high  DOA  contributed  to  age-related  differences  in 
complacency. 

Current  Study 

The  aim  of  this  study  is  to  examine  the  relationship 
between  automation-induced  complacency  and  working 
memory.  Age-related  differences  in  working  memory  have 
been  implicated  as  a  possible  cause  of  age-related  differences 
in  automation-induced  complacency.  However,  prior 
automation  studies  (e.g..  Ho  et  al.,  2005b)  have  not 
manipulated  working  memory  demands  of  the  task  to  observe 
how  complacency  is  affected.  Therefore,  we  will  use  two 
DOAs  that  vary  in  working  memory  demand.  This  study  will 
analyze  speed  and  accuracy  of  user  selections  at  each  DOA. 
Performance  under  reliable  and  unreliable  trials  can  provide 
information  to  infer  the  degree  to  which  users  are  complacent 
with  automation. 
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METHOD 


Participants 

Thirty-six  undergraduate  students  will  be  recruited  for 
this  research  and  given  course  credit  for  participation.  Thirty- 
six  older  adults  from  the  local  area  will  be  recruited  and  will 
be  compensated  for  their  time. 

Task 

The  tasks  for  this  study  will  be  adapted  from  prior 
research  that  uses  an  automated  system  in  the  context  of  a 
low-fidelity  UAV  simulation  (Rovira,  McGarry,  & 
Parasuraman,  2007).  The  primary  task  for  this  study  will  be  to 
quickly  and  accurately  find  the  closest  combination  of  friendly 
(green  units)  and  enemy  units  (red  units)  in  terms  of  distance 
apart  on  the  grid  (Figure  1).  Automation  will  be  presented  as  a 
table  in  the  bottom  left-hand  corner  of  the  screen,  which  will 
display  the  distances  and  unit  combinations  needed  by 
participants  to  complete  the  primary  task.  The  secondary  task 
will  consist  of  checking  for  a  specific  call  sign  and  clicking  a 
corresponding  button  when  it  appears  on  screen.  The  call  sign 
is  comprised  of  a  single  word  and  number  combination  (e.g. 
Hunter-6).  The  program  will  randomly  alternate  between  14 
different  call  signs  every  5  seconds  as  the  participant 
completes  the  primary  task. 


Figure  1.  Screenshot  of  a  low  degree  of  automation  (DOA)  and  low  workload 
trial  within  the  targeting  system  that  features  the  communications  panel  (top- 
left),  targeting  input  panel  (top-left),  automation  table  (bottom-left),  and  grid 
(right). 


Participants  will  complete  blocks  of  trials  in  a  random 
counterbalanced  order,  where  each  block  will  consist  of  a 
different  DOA  and  workload  level.  The  DOA  manipulation 
will  change  the  stage  and  level  of  the  automation  table  used  in 
the  task.  The  lower  DOA  will  use  the  information  acquisition 
stage,  which  presents  all  possible  friendly  and  enemy  unit 
combinations  from  the  grid,  with  a  low  level  of  automation 
that  does  not  sort  the  information  in  any  meaningful  way.  The 
higher  DOA  will  use  the  decision  and  action  selection  stage, 
which  will  present  the  top  3  friendly  and  enemy  unit 
combinations.  In  addition,  the  high  level  of  automation  will 
sort  the  information  based  on  importance,  so  that  the  shortest 
distance  combination  is  presented  at  the  top.  The  workload 


manipulation  will  change  the  number  of  units  presented  in  the 
grid.  Low  workload  will  present  3  friendly  and  3  enemy  units, 
while  high  workload  will  show  6  friendly  and  6  enemy  units. 
Each  combination  of  DOA  and  workload  will  be  presented 
twice  for  a  total  of  8  blocks  and  240  trials. 

The  overall  automation  reliability  will  be  set  at  80%, 
which  is  above  the  threshold  for  imperfect  reliability 
acceptance  (Wickens  &  Dixon,  2007).  In  each  block  of  30 
trials,  24  trials  will  be  reliable  and  the  remaining  6  trials  will 
be  unreliable.  An  unreliable  trial  will  contain  inflated  distance 
values  between  unit  combinations  or  incorrect  optimal 
suggestions  in  the  automation  support  table.  The  first  aid 
failure  will  not  occur  until  the  lO'"’  trial,  so  that  users  can  build 
rebuild  trust  after  each  block.  Also,  the  automation  failures 
will  be  distributed  randomly  throughout  the  remaining  trials. 

Measures 

Ability  measures.  The  following  abilities  will  be  assessed: 
perceptual  speed  (digit-symbol  substitution;  Wechsler,  1997), 
working  memory  (automated  operation  span  (Aospan); 
Unsworth,  Heitz,  Schrock,  &  Engle,  2005),  and  vocabulary 
(Shipley  vocabulary;  Shipley,  1986).  These  measures  were 
chosen  because  they  are  reliable  indicators  of  their  respective 
abilities  (e.g.,  Czaja  et  al.,  2006;  Unsworth  et  ah,  2005).  The 
cognitive  ability  measures  were  selected  to  confirm  age 
differences  in  fluid  and  crystalized  intelligence.  Specifically, 
the  working  memory  ability  measure  serves  to  control  for 
differences  in  targeting  task  performance  between  age  groups. 

NASA-Task  Load  Index  (NASA-TLX).  Subjective 
workload  will  be  measured  with  the  NASA-TEX  (Prichard, 
Bizo,  &  Stratford,  2011).  A  computer  version  of  the  task  will 
present  6  items  that  constitute  overall  workload:  mental 
demand,  physical  demand,  temporal  demand,  performance, 
effort  and  frustration.  Each  item  is  rated  on  a  Likert  scale  of  0 
to  20,  where  higher  values  indicate  increased  workload. 
Subjective  workload  will  be  calculated  as  the  average  of  the  6 
combined  items.  The  NASA-TLX  will  be  used  as  a 
manipulation  check  for  DOAs  and  age  differences  in 
perceived  workload. 

Trust  Questionnaires.  Subjective  trust  will  be  measured 
with  a  general  rating  of  trust  in  automation  (Jian,  Bisantz,  & 
Drury,  2000).  This  measure  is  a  12-item  survey  that  is  rated  on 
a  Likert  scale  of  1  (not  at  all)  to  7  (extremely).  The  first  5 
questions  are  negatively  framed  and  the  last  7  are  positively 
framed.  Trust  is  the  sum  of  normal  and  reverse  coded 
responses.  Higher  scores  on  this  measure  indicate  greater  trust 
in  the  automated  system.  The  measure  will  be  analyzed  for 
age-related  differences  in  trust  towards  automation. 

In  addition,  we  will  use  a  survey  adapted  from  Lee  and 
Moray  (1992)  to  measure  subjective  trust  specifically  towards 
each  DOA  and  working  memory  manipulation.  This  trust 
measure  will  pose  3  questions,  rated  from  0  (not  at  all)  to  100 
(extremely),  about  the  automated  aid  used  in  each  set  of  trials. 
For  example,  the  questions  will  ask  participants  to  answer  how 
much  they  trusted,  relied  upon,  or  benefited  from  using  the 
automated  aid.  The  overall  score  will  consist  of  an  average  of 
those  questions  and  higher  scores  will  indicate  higher  trust. 
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Additionally,  this  questionnaire  will  be  used  to  examine  trust 
differences  between  age  groups,  level  of  workload,  and  DOA. 

Complacency  Potential  Rating  Scale  ( CPRS ).  The  CPRS 
measures  individual  potential  complacency  behavior  (Singh, 
Molloy,  &  Parasuraman,  1993).  This  20-item  scale  contains  4 
filler  items  and  is  rated  on  a  Likert  scale  of  1  (strongly 
disagree)  to  5  (strongly  agree).  The  CPRS  scores  is  a  sum  of 
these  responses  except  for  the  filler  responses,  where  higher 
values  on  this  measure  indicate  an  increased  complacency 
potential.  The  CPRS  was  selected  in  order  to  examine  age 
differences  in  complacency  potential. 

Design 

The  current  study  is  a  2  (age  group:  young  or  old)  x  2 
(DOA:  low  or  high)  x  2  (automation  reliability:  unreliable  or 
reliable)  x  2  (workload:  low  or  high)  mixed-subjects  design. 
Age  group  will  be  a  between-subjects  independent  variable. 
These  groups  will  differ  in  working  memory  capacity  because 
older  adults  have  been  shown  to  have  less  of  this  ability  than 
younger  adults.  DOA,  automation  reliability,  and  workload 
will  be  within-subjects  independent  variables.  The  DOAs 
serve  as  our  working  memory  demand  manipulation. 

The  dependent  variables  will  be  targeting  task  accuracy, 
targeting  task  completion  time,  complacency  potential, 
subjective  trust,  subjective  workload,  and  working  memory 
capacity.  Targeting  task  accuracy  will  be  measured  by  the 
mean  rate  of  optimal  responses  for  each  automation  block.  An 
optimal  response  is  the  identification  of  the  closest  pair  of 
friendly  and  enemy  units  on  the  targeting  task  grid.  Targeting 
task  time  will  be  measured  by  the  average  duration  (in 
milliseconds)  it  takes  participants  to  complete  each  trial. 
Complacency  potential  will  be  comprised  of  scores  on  the 
CPRS.  Subjective  trust  will  be  measured  by  the  sum  of 
subjective  ratings  on  the  trust  questionnaire  for  each 
combination  of  DOA  and  workload  level.  Subjective  workload 
will  consist  of  an  average  of  the  6  items  on  the  NASA-TLX 
and  will  be  measured  for  each  combination  of  DOA  and 
workload  level.  Working  memory  capacity  will  be  measured  as 
the  sum  of  perfectly  recalled  sets  of  letters  on  the  Aospan  task. 

Procedure 

Participants  will  be  seated  at  individual  PC-computers 
and  provided  with  informed  consent.  They  will  be  instructed 
to  complete  the  demographics  form  and  the  cognitive  ability 
measures.  The  experimenter  will  then  tell  participants  to  open 
and  observe  the  targeting  task  instructions  screen.  Participants 
will  be  told  the  following:  “In  this  experiment,  you  will  have 
two  tasks.  The  first  task  will  be  to  monitor  the 
communications  panel  for  the  call  sign  Hunter-6.  When  you 
see  Hunter-6,  you  should  click  the  answer  button.  The  second 
task  will  be  to  target  enemy  units  with  the  closest  artillery  unit 
as  quickly  as  you  can.  You  will  do  this  by  first  selecting  an 
artillery  unit  and  then  select  an  enemy  target  from  the  list  of 
buttons.  The  computer  will  sometimes  help  you  with  this  task 
by  showing  you  the  distances  between  friendly  and  enemy 
units.  Sometimes,  two  sets  of  targets  will  have  the  same 
distance.  In  this  case,  you  will  pick  the  one  with  the  shortest 


distance  to  the  headquarters.  Sometimes  the  computer  aid  will 
give  you  lots  of  information,  other  times  it  will  give  you  much 
less  information.  The  computer  can  be  very  reliable  but  it  is 
not  perfect  all  the  time.”  After  these  instructions,  the 
experimenter  will  answer  questions  before  continuing. 

As  the  participants  complete  the  tasks,  the  units  in  the 
grid  and  the  values  within  automation  table  will  change  for 
each  subsequent  trial.  Between  each  block  of  trials, 
participants  will  fill  out  the  NASA-TLX  and  a  brief  subjective 
trust  measure.  During  the  experiment,  a  screen  will  appear  to 
indicate  when  participants  linger  too  long  on  a  particular  trial. 
If  participants  do  not  input  friendly  and  enemy  unit 
combinations  within  the  set  time  limit,  the  program  will 
automatically  continue  to  the  next  trial.  Younger  adults  will 
have  10  seconds  to  complete  each  trial,  while  older  adults  will 
have  15  seconds.  Older  adults  will  have  more  time  for  the  task 
because  of  normative  age-related  differences  in  psychomotor 
speed  (Salthouse,  1985). 

Participants  will  proceed  through  each  block  of  trials  and 
the  computer  will  notify  them  when  they  are  finished.  When 
they  complete  the  automation  program,  participants  will  be 
presented  with  a  general  subjective  measure  of  trust  in 
automation  and  the  CPRS.  At  the  conclusion  of  the 
experiment,  participants  will  be  debriefed  and  provided 
compensation  for  their  time. 

EXPECTED  RESULTS 

Repeated  measures  ANOVAs  will  be  performed  to  test 
these  expected  results.  We  anticipate  main  effects  of  DOA  as 
well  as  age  group  on  targeting  task  accuracy  and  task  time, 
where  younger  adults  should  outperform  older  adults.  Overall, 
we  expect  participants  to  perform  better  under  a  higher  DOA 
(i.e.  decision  selection  stage  with  higher  level)  than  a  lower 
DOA  (i.e.  information  acquisition  stage  with  lower  level). 
Also,  we  will  measure  differences  in  subjective  workload  and 
trust  towards  specific  DOAs  and  levels  of  workload.  For  those 
variables,  we  expect  to  find  main  effects  of  workload  and 
DOA. 

Since  we  expect  an  inverse  relationship  between  DOA 
and  cognitive  demand,  we  hypothesize  that  older  adults  will 
have  a  greater  tendency  to  become  complacent  under  a  lower 
DOA.  We  can  infer  the  extent  to  which  participants  are 
complacent  by  analyzing  their  pattern  of  performance  at 
different  reliability  levels.  A  greater  difference  between 
performance  with  unreliable  and  reliable  automation  indicates 
higher  complacency  because  the  user  is  relying  heavily  on  the 
system  without  monitoring  for  failures.  Therefore,  we  will 
perform  a  repeated  measures  ANOVA  to  examine  targeting 
task  accuracy  for  unreliable  and  reliable  trials  across  DOAs 
and  age  groups.  We  hypothesize  a  lower  DOA  will  result  in  a 
greater  difference  in  complacency  between  age  groups  than  a 
higher  DOA.  We  anticipate  this  result  because  a  higher  DOA 
should  support  working  memory  ability  by  taking  on  more 
cognitive  demanding  tasks  that  would  otherwise  burden  the 
user.  Consistent  with  previous  findings,  younger  adults  should 
be  more  inclined  to  become  complacent  with  a  higher  DOA. 
When  taking  into  account  age  group  differences  in  working 
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memory  ability,  we  expect  that  age-related  performance 
effects  will  not  be  present. 

Finally,  we  anticipate  that  older  adults  will  have  higher 
general  trust  and  complacency  potential  than  younger  adults. 
We  will  conduct  two  independent  samples  t-tests  to  compare 
differences  in  complacency  potential  and  general  trust  in 
automation  between  age  groups. 

DISCUSSION 

It  is  important  to  understand  the  factors  that  contribute  to 
complacent  behaviors  within  the  human-automation 
interaction.  For  the  design  of  automated  systems,  it  is 
necessary  to  consider  factors  such  as  reliability  and  workload. 
Since  high  system  reliability  is  common  in  most  automated 
technologies  today  and  thus  makes  users  more  susceptible  to 
complacent  behaviors,  it  is  essential  to  alert  the  user  to 
potential  automation-related  failures  that  can  occur.  In  terms 
of  task  demands,  keeping  the  task  manageable  for  the  user  is 
critical  for  detecting  and  correcting  inaccuracies. 

Designers  should  select  the  appropriate  DOA  for  the 
known  population  of  users.  Specifically,  the  design  of 
automated  tasks  should  consider  the  age  of  the  user. 
Automation  can  be  presented  in  many  different  ways  and  can 
perform  a  wide  range  of  tasks  for  the  user.  Depending  on  the 
type  of  task,  some  forms  may  demand  more  working  memory 
than  others.  Limiting  working  memory  demand  through 
automation  can  be  beneficial  to  both  younger  and  older  adults. 
This  may  help  to  reduce  the  occurrence  of  complacent 
behaviors  during  interaction  with  automation. 
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Effects  of  Age  and  Gender  Stereotypes  on  Trust  in  an 
Anthropomorphic  Decision  Aid 


Brock  Bass\  Meghan  Goodwin^  Kayla  Brennan',  Richard  Pak',  &  Anne  McLaughlin^ 
'Clemson  University,  ^North  Carolina  State  University 


Stereotypes  are  beliefs  about  the  capabilities  of  another  group.  Previous  research  indicates  stereo¬ 
types  can  affect  how  users  interact  with  anthropomorphic  computer  aids.  User  perception  can  be 
affected  by  gender  and  age  stereotypes  elicited  by  the  appearance  of  the  computer  system.  Other 
research  has  shown  that  perceptions  of  automation  (e.g.,  implicit  ones  such  as  propensity  to  trust 
automation,  or  perceptions  of  etiquette)  interact  with  reliability  to  influence  automation  trust  be¬ 
havior.  The  current  study  built  upon  these  ideas  to  examine  whether  implicit  beliefs  (i.e.,  stereo¬ 
types)  about  the  perceived  age  and  gender  of  automation  interacted  with  reliability  to  affect  per¬ 
ceptions  of  trust  in  automation.  We  employed  a  factorial  survey  where  we  presented  scenarios  of 
automation  to  younger  adults.  The  anthropomorphized  automation  had  a  perceived  age  and  gen¬ 
der,  and  was  stated  to  be  variably  reliable. 


INTRODUCTION 

Stereotypes  in  Human-Computer  Interaction  (HCI) 

Stereotypes  are  preconceptions  about  the  traits,  be¬ 
havior,  or  abilities  of  another  group.  They  help  set  our 
expectations  of  individuals  that  we  meet.  For  example,  a 
commonly  held  stereotype  of  athletes  is  that  they  are 
unintelligent,  but  have  social  prowess.  As  the  example 
shows,  stereotypes  can  have  both  negative  and  positive 
connotations  that  may  be  inconsistent  with  real  group 
attributes  (i.e.,  not  all  athletes  may  be  unintelligent  or 
have  social  prowess).  Stereotypes  have  adaptive  value 
because  they  function  as  schemas  by  filtering  and  organ¬ 
izing  incoming  information  thereby  easing  processing 
and  interpretation  (Hilton  &  Von  Hippel,  1996).  Howev¬ 
er,  when  the  stereotype  is  highly  simplified  or  inaccu¬ 
rate,  it  can  lead  to  errors  in  perceptions  and  behavior. 

Stereotypes  do  not  just  affect  person-perception,  but 
also  computer-perception.  Computers,  intentionally  or 
not,  can  exhibit  anthropomorphic  characteristics.  An¬ 
thropomorphism  can  be  defined  as  the  attribution  of 
human  characteristics  (e.g.,  mental  states,  motives,  and 
emotions)  to  non-human  agents,  such  as  computers 
(Epley,  Waytz,  &  Cacioppo,  2007).  Previous  research 
has  investigated  the  phenomenon  of  human  users  imput¬ 
ing  human  social  characteristics  (e.g.,  stereotypes)  to 
computer  systems  (Nass,  Steurer,  &  Tauber,  1994).  This 
phenomenon  is  addressed  by  the  Computers  are  Social 
Actors  experimental  paradigm  (Nass  et  ah,  1994).  The 
CASA  experimental  paradigm  described  by  Nass  et  al.  is 
as  follows:  pick  a  social  science  finding,  replace  the  hu¬ 
man  with  a  computer,  design  the  computer  with  charac¬ 
teristics  associated  with  humans,  and  determine  if  the 
rule  still  applies.  A  wide  range  of  studies  using  the 
CASA  paradigm  have  shown  that  users  tend  to  treat 


computers  with  the  same  social  rules  and  heuristics  as 
they  would  other  people  (Fogg  &  Nass,  1997;  Katagiri, 
Nass,  &  Takeuchi,  2001;  Zanbaka,  Goolkasian,  & 
Hodges,  2006). 

Gender  and  Age  Stereotypes  in  HCI 

Previous  research  has  shown  that  gender  stereotypes 
are  present  in  HCI.  Nass  et  al.  (1994)  confirmed  that 
humans  apply  gender  stereotypes  relating  to  “knowl- 
edgeability”  in  HCI  situations.  That  is,  when  the  per¬ 
ceived  gender  of  the  computer  voice  matched  the  stereo¬ 
typic  topic  (love  and  relationships  for  the  female  voice; 
computers  and  technology  for  the  male  voice),  subjects 
rated  the  computer  as  a  better  teacher.  This  finding  con¬ 
firms  the  pre-existing  gender  stereotype  between  gender 
and  appropriateness  of  topic.  An  implication  from  this 
finding  is  that  computers  can  be  perceived  as  “gendered” 
just  as  we  assign  gender  stereotypes  to  humans.  At  a 
broader  level,  the  study  supports  the  influential  power 
that  gender  stereotypes  carry  (Nass  et  al.,  1994).  Further 
supporting  users’  tendency  to  gender-type  computers, 
Lee  (2003)  showed  that  people  conform  to  the  advice  of 
an  aid  that  is  sex-stereotypically  matched  (feminine  aid 
for  fashion,  masculine  aid  for  sports).  These  gender  ste¬ 
reotype  studies  demonstrate  that  even  when  users  may 
be  unaware  of  applying  stereotypes;  these  stereotypes 
nonetheless  become  activated  and  affect  perceptions  in  a 
task  with  a  gendered  computer. 

Interestingly,  perceived  ethnicity  has  been  shown  to 
contribute  to  user  perceptions  of  benefits  from  anthro¬ 
pomorphic  agents  more  than  perceived  gender  (Benba- 
sat,  Dimoka,  Pavlou,  &  Qui,  2010).  Users  perceived 
agents  as  more  enjoyable  and  useful  when  there  was  a 
perceived  ethnicity  match  (Asian  users  with  Asian  aids, 
Caucasian  users  with  Caucasian  aids),  but  there  was  no 
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effect  for  gender  matching.  Benbasat  et  al.  concluded 
that  the  significant  perceived  ethnicity  effect  was  due  to 
the  similarity-attraction  hypothesis  (Byrne,  1971),  which 
states  that  people  are  more  attracted  to  those  who  are 
similar  to  themselves. 

The  vast  majority  of  CASA  studies  examined  a  sin¬ 
gle  age  group  (younger  adults)  and  thus  have  not  exam¬ 
ined  or  manipulated  perceived  age  of  the  computer 
agent.  Age  is  one  of  the  first  and  most  salient  attributes 
we  notice  of  other  people  (Fiske,  Kitayama,  Markus,  & 
Nisbett,  1998),  which  may  also  be  true  of  anthropo¬ 
morphic  agents  in  HCI.  Therefore,  examining  age  stereo¬ 
types  among  younger  and  older  adults  is  relevant  in  HCI. 
There  is  evidence  that  age  stereotypes  (i.e.,  stereotypes 
about  older  adults)  are  much  stronger  (Kite,  Deaux,  & 
Miele,  1991)  and  more  complex  than  gender  stereotypes 
(Kite,  Stockdale,  Whitley,  &  Johnson,  2005).  Kite  et  al. 
(1991)  assessed  age  and  gender  stereotypes  and  showed 
that  when  negative  stereotypes  were  generated,  they 
were  more  likely  due  to  the  age  of  the  target  person  than 
the  gender  (approximately  3  times  greater  in  magnitude). 
This  suggests  that,  according  to  the  similarity-attraction 
hypothesis  (Byrne,  1971),  older  adults  should  exhibit 
positive  anthropomorphic  effects  with  automation  that 
matches  their  age  group.  A  previous  study  (Pak,  Fink, 
Price,  Bass,  &  Sturre,  2012)  found  that  a  young  female 
agent  affected  trust  in  automation  in  younger  adults,  but 
not  in  older  adults.  One  explanation  for  the  age  differ¬ 
ence  was  that  the  dissimilarity  between  a  younger  female 
decision  aid  and  an  older  participant  may  have  muted 
any  potential  anthropomorphic  effect  on  trust  due  to  the 
similarity-attraction  hypothesis.  An  alternative  explana¬ 
tion  is  that  older  adults  hold  negative  stereotypes  of  the 
capabilities  of  younger,  female  doctors. 

Stereotype  Activation  Depends  on  Individuating 
Information 

Individuating  information  such  as  context  (e.g.,  in¬ 
teracting  with  a  doctor)  is  also  known  to  determine 
which  aspect  of  a  stereotype  gets  activated  (Casper, 
Rothermund,  Wentura,  2011).  Kunda  and  Sherman- 
Williams  (1993)  found  whatever  the  stereotype;  its  ulti¬ 
mate  construal  and  effect  on  judgment  will  depend  on 
the  individuating  information.  Knowing  the  occupation 
of  an  individual  is  a  type  of  individuating  information 
that  seems  to  alter  some  negative  age  stereotypes.  For 
instance,  people  hold  stereotypes  that  in  general  older 
workers  have  lower  ability,  are  less  motivated,  and  are 
less  productive  than  younger  workers  (Posthuma  & 
Campion,  2009).  Older  adult  workers  are  seen  as  less 
adaptable  to  changing  work  situations  and  uncertainty 
than  younger  workers  (DeArmond,  Tye,  Chen,  & 

Krauss,  2006).  However,  the  occupation  of  physician  is 


moderately  seen  as  a  stereotypically  older  male  occupa¬ 
tion  (Singer,  1986),  even  though  it  is  an  occupation  that 
may  require  adaptability  and  facing  uncertainty.  The  in¬ 
dividuating  information  (i.e.,  occupation  of  a  doctor) 
allows  certain  aspects  of  the  stereotype  to  be  activated 
but  not  others. 

In  addition  to  aspects  like  occupation,  another  type 
of  individuating  information  is  past  behavior,  or  more 
relevant  to  the  current  study,  a  past  history  of  ambiguous 
system  performance  (i.e.,  history  of  moderate  reliabil¬ 
ity).  The  assumption  is  ambiguous  system  performance 
will  lead  to  stereotype  activation,  while  unambiguous 
system  performance  (i.e.,  history  of  unambiguously  low 
or  high  reliability)  will  not.  Merritt  and  Ilgen  (2008)  the¬ 
orized  that  implicit  attitudes  about  automation  affect 
whether  an  individual  trusts  automation.  The  user’s  ex¬ 
plicit  (e.g.,  reliability)  and  implicit  (e.g.,  stereotypes) 
beliefs  (schemas)  about  automation  will  shape  their  per¬ 
ceptions  of  automation  behavior  (Dzindolet,  Pierce, 
Beck,  &  Dawe,  2002).  Merritt  and  Ilgen  found  that  when 
automation  reliability  was  ambiguous,  implicit,  pre¬ 
existing  beliefs  about  automation  were  more  influential 
in  determining  trust  than  explicit  beliefs.  Presumably,  in 
the  face  of  automation  ambiguity,  individuals  made  at¬ 
tributions  that  were  consistent  with  their  implicit,  sche¬ 
matic  pre-existing  beliefs  about  automation.  This  paral¬ 
lels  findings  from  the  social  cognition  literature  which 
shows  causal  reasoning  is  common  when  an  individual  is 
faced  with  conflicting  or  ambiguous  information  (Kunda 
&  Thaggard,  1996).That  is,  when  automation  is  unam¬ 
biguously  good  or  bad,  stereotypes  should  not  affect 
perceptions.  But  when  automation  is  ambiguous,  stereo¬ 
types  will  exert  an  effect  on  perceptions  of  the  automa¬ 
tion  (i.e.,  trust).  Previous  human  factors  research  has 
shown  that  automation  reliability  has  a  “crossover  point” 
or  threshold  that  affects  human  operator  performance. 
This  threshold  occurs  when  automation  is  below  70  % 
reliable,  and  results  in  operator  performance  similar  to 
situations  with  no  automation.  That  is,  when  automation 
is  much  less  than  70%  reliable,  the  operator  begins  to 
behave  as  if  there  was  no  automation  present  (Wickens 
&  Dixon,  2007). 

Current  Study 

The  purpose  of  the  current  study  was  to  investigate 
how  trust  in  automation  is  affected  by  stereotypes  (age 
and  gender)  and  how  these  stereotypes  interact  with  ma¬ 
chine  factors  (reliability)  to  affect  user  trust.  We  manipu¬ 
lated  anthropomorphic  aids  on  the  following  variables: 
perceived  gender  (male,  female),  perceived  age  (young, 
old),  and  automation  reliability  (45  %,  70  %,  95  %)  to 
investigate  their  effect  on  user  trust  in  HCI.  The  70  % 
reliability  level  reflects  the  “crossover  point”  described 
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in  previous  literature,  and  the  45  %  and  95  %  reliability 
levels  were  chosen  as  substantial  deviations  from  70  % 
reliability  (i.e.,  a  25  %  increase  or  decrease  in  reliabil¬ 
ity).  In  the  current  study,  we  used  a  factorial  survey 
methodology,  which  is  a  type  of  survey  that  contains 
elements  of  a  factorial  experiment.  In  a  factorial  survey, 
a  respondent  evaluates  a  scenario  and  then  is  asked  to 
make  a  judgment  of  interest.  The  scenario  can  be  a  short 
story  or  a  snapshot  of  a  situation,  which  in  the  current 
study  is  in  the  context  of  diabetes  management.  The  im¬ 
portant  aspect  is  that  specific  factors  of  the  scenario  are 
being  manipulated  (in  a  factorial  manner).  The  respond¬ 
ent  is  repeatedly  exposed  to  all  combinations  of  factors 
in  a  series  of  scenarios.  Because  our  dependent  variable 
(trust)  is  a  social  judgment  about  a  situation,  a  factorial 
survey  is  an  ideal  way  to  measure  how  judgment  is  in¬ 
fluenced  by  perceptions  of  the  automation  (i.e.,  age, 
gender,  reliability)  as  well  as  individual  differences.  Fac¬ 
torial  surveys  have  been  widely  used  in  various  domains 
to  examine  how  beliefs,  judgments,  and  decision-making 
are  influenced  by  situational  factors. 

METHOD 

Participants 

The  participants  for  the  study  were  Clemson  Univer¬ 
sity  undergraduate  students  (N  =  50).  The  age  range  for 
these  participants  was  17  to  23  (M  =  18.58,  SD  =  .93). 
These  introductory  psychology  students  received  class 
credit  for  participating  in  the  study. 

Apparatus 

Participants  viewed  the  experiment  on  desktop  com¬ 
puters  situated  in  cubicles.  The  computers  presented 
stimuli  on  19 -inch  LCD  monitors  and  participants  made 
all  responses  using  the  keyboard  and  mouse.  They  were 
seated  in  office  chairs  about  18-24  inches  from  the 
screen  in  an  office  environment. 

Design 

The  study  was  a  2  (gender  of  respondent:  male,  fe¬ 
male)  X  2  (perceived  aid  age:  young,  old)  x  2  (perceived 
aid  gender:  male,  female)  x  3  (automation  reliability: 
low,  medium,  high)  mixed  factorial  survey.  The  first 
variable  (gender  of  respondent)  was  a  quasi-independent 
grouping  variable,  while  the  last  3  were  within-groups 
manipulations  of  the  automation.  The  dependent  varia¬ 
bles  were  trust,  likelihood  of  following  advice,  compla¬ 
cency  potential  rating  scale  (CPRS;  Singh,  Molloy,  & 
Parasuraman,  1993),  and  the  participant’s  diabetes 
knowledge. 


Procedure 

Participants  first  completed  a  diabetes  knowledge 
questionnaire  administered  on  a  computer.  The  23  ques¬ 
tions  assessed  basic  knowledge  about  diabetes  and  dia¬ 
betes  management.  Next,  participants  started  the  factori¬ 
al  survey  portion.  Participants  were  told  the  following: 
“You  are  playing  the  part  of  a  newly  diagnosed  diabetic. 
Your  doctor  has  given  you  a  variety  of  different 
smartphone  apps  that  may  help  you  with  your  diabetes 
care.  Your  task  involves  giving  us  your  opinion  of  the 
different  smartphone  apps.  Just  like  many  technological 
aids,  the  different  apps  will  only  sometimes  seem  relia¬ 
ble.  Your  performance  is  not  being  tested  so  you  do  not 
have  to  try  to  solve  every  problem.  Instead,  you  are  mak¬ 
ing  judgments  of  the  smartphone  apps  as  quickly  as  pos¬ 
sible.”  After  acknowledging  the  instructions  and  asking 
any  remaining  questions  they  began  the  survey.  In  the 
survey,  participants  viewed  each  vignette  and  were 
asked  the  following  questions:  1)  how  much  they  trusted 
the  smartphone  app  on  a  Likert  scale  from  1  (not  at  all) 
to  7  (very  much),  and  2)  whether  they  would  follow  the 
advice  of  the  app  (Likert  scale,  1-7).  After  each  question, 
participants  were  also  asked  to  briefly  explain  their  rat¬ 
ings  by  typing  a  brief  explanation  in  a  subsequent  field. 
To  reinforce  the  notion  that  the  smartphone  app  was  real 
automation  (and  not  just  a  pre-computed  image),  the 
smartphone  app  did  not  reveal  its  answer  for  1.5  seconds 
(in  the  interim  the  message  “Analyzing  the  scenario.  Just 
a  moment...”  appeared  on  the  smartphone  screen).  After 
responding  to  24  vignettes,  participants  completed  the 
complacency  potential  rating  scale.  Finally,  after  com¬ 
pleting  the  CPRS,  participants  answered  the  question, 
"What  do  you  think  the  study  was  about?"  in  order  to 
assess  whether  participants  realized  the  purpose  of  the 
study.  This  question  was  to  determine  if  participants 
were  aware  of  our  experimental  manipulation  and  thus 
prone  to  demand  characteristics. 

RESULTS 

The  results  from  this  study  are  presented  in  two  sec¬ 
tions  aligned  with  data  type:  quantitative  (Likert  ratings) 
and  qualitative  (explanations  for  Likert  ratings). 

Quantitative  Data 

To  examine  differences  in  trust  as  a  function  of  aid 
characteristics,  a  2  (age  group  of  aid)  x  2  (gender  of  aid) 
X  3  (reliability  of  aid)  ANOVA  was  conducted.  There 
was  a  significant  3-way  interaction  of  age  group  of  aid  x 
gender  of  aid  x  reliability  of  aid,  F(l,  1440)  =  3.84,  p  < 
.05.  This  finding  came  from  analyzing  the  trust  ratings 
given  by  the  participants  concerning  each  aid  via  Likert 
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scales.  The  trust  ratings  were  further  analyzed  as  a  func¬ 
tion  of  the  reliability  of  the  aid  (see  Figure  1).  In  the  low 
reliability  (45  %)  condition  it  was  found  that  the  older 
female  aid  was  the  most  trusted.  The  younger  male  aid 
was  the  second  most  trusted  in  the  low  reliability  condi¬ 
tion.  In  the  moderate  reliability  (70%)  condition  it  was 
found  that  the  older  male  aid  was  the  most  trusted,  while 
the  younger  female  aid  was  the  second  most  trusted. 

This  replicated  previous  research  findings  (Pak  et  ah, 
2012)  that  showed  younger  adults  trusted  a  younger  fe¬ 
male  aid  significantly  more  than  a  non-anthropomorphic 
aid.  In  the  high  reliability  (95  %)  condition  there  was  no 
significant  difference  found  in  trust  ratings.  Regardless 
of  the  aid  characteristics  (i.e.,  gender,  age)  the  partici¬ 
pants  indicated  similar  trust  ratings  for  each  aid  in  the 
high  reliability  condition. 


the  stereotype  of  older  male  doctors  (i.e.,  older  men  are 
trustworthy  doctors). 

Age/Gender  of  Aid  Percent  of  statements  matching  category 
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Figure  2.  Categorization  of  trust  explanations. 
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Figure  7.  Trust  ratings  for  anthropomorphic  aids. 


Qualitative  Data 

The  participants’  explanations  for  their  trust  ratings 
were  used  to  help  interpret  the  numerical  trust  ratings 
presented  above.  Two  coders  achieved  reliability  above 
70%  on  the  simple  coding  scheme.  Trust  explanations 
were  coded  by  their  dominant  theme  using  a  coding 
scheme  generated  from  a  subset  of  a  random  number  of 
statements.  We  have  currently  only  examined  1/4  of  the 
qualitative  data  (approximately  340  of  the  1200  state¬ 
ments),  but  the  trends  (Figure  2)  seem  to  show  that  sub¬ 
jects  did  not  overtly  attribute  trust  ratings  to  perceived 
age  or  gender  (categories  A  and  B).  However,  there  did 
seem  to  be  a  trend  to  attribute  trust  ratings  more  to  a 
general  tendency  to  trust/mistrust  machines  when  the  aid 
was  younger  (category  C)  compared  to  older.  In  addi¬ 
tion,  subjects  stated  that  they  trusted/mistrusted  the  aid 
because  of  their  double-checking  efforts  most  for  the 
older  male  aid  (category  E).  This  may  be  reflective  of 


DISCUSSION 

The  findings  of  the  current  study  extend  the  litera¬ 
ture  about  how  people  treat  and  behave  with  anthropo¬ 
morphic  computer  aids.  We  found  that  age  stereotypes 
can  be  elicited  and  can  affect  trust,  but  in  a  complex 
way.  The  significant  3 -way  interaction  (age  group  of  aid 
X  gender  of  aid  x  reliability  of  aid)  can  be  thought  of  as 
a  2-way  interaction  (age  group  of  aid  x  gender  of  aid) 
that  varied  across  the  independent  variable  of  the  relia¬ 
bility  of  aid. 

Although  highly  reliable  automation  is  desired, 
many  automated  systems  would  be  classified  as  having 
moderate  reliability.  In  light  of  our  findings,  moderately 
reliable  automation  would  activate  user  stereotypes  and 
subsequently  affect  the  user’s  trust  ratings.  This  finding 
shows  the  necessity  of  proper  use  of  stereotypes  in  au¬ 
tomation,  specifically  when  it  is  ambiguous  in  reliability. 
Designing  automation  to  contain  anthropomorphic  aids 
that  activate  users’  stereotypes  could  be  a  future  area  of 
dispute  and  present  difficult  questions.  For  example,  if 
automation  is  only  moderately  reliable  should  there  be 
an  anthropomorphic  aid  that  may  cause  users’  trust  to 
increase  for  this  automation?  This  question  may  have  to 
be  answered  according  to  the  context  in  which  the  auto¬ 
mation  is  aiding  the  user,  and  the  consequences  associat¬ 
ed  with  following  the  aid.  A  future  study  could  examine 
the  current  study’s  finding  of  a  higher  level  of  trust  in 
the  older  female  aid  in  low  reliability  conditions.  Poten¬ 
tially,  there  is  a  “motherly”  aspect  to  some  aids  that 
cause  trust  when  the  conditions  warrant  this  behavior. 
The  current  study  has  provided  more  evidence  that  HCI 
is  similar  to  human-human  interaction,  and  that  the  in¬ 
teraction  between  stereotypes  and  individuating  infor¬ 
mation  (e.g.,  automation  reliability)  is  an  area  rich  for 
exploration. 
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Faces  as  Ambient  Displays:  Assessing  the  attention-demanding 
characteristics  of  facial  expressions 


Brock  M.  Bass  &  Richard  Pak 
Clemson  University 
Department  of  Psychology 

Ambient  displays  are  used  to  provide  information  to  a  user  in  a  non-distracting  manner.  The 
purpose  of  this  research  is  to  examine  the  efficacy  of  facial  expressions  as  ambient  displays.  Facial  emotion 
recognition  requires  very  little  if  any  conscious  attention,  which  makes  it  an  excellent  candidate  for  the 
ambient  presentation  of  information.  This  study  will  investigate  whether  using  facial  expressions  as  an 
ambient  display  permits  humans  to  gain  information  with  ease.  This  study  will  assess  the  attention¬ 
demanding  characteristics  of  Chernoff  faces  in  a  dual-task  experiment.  The  data  from  this  study  could  be 
helpful  in  understanding  whether  humans  are  able  to  use  facial  expressions  for  gaining  quick  and  concise 
information  about  a  particular  system  or  device. 


INTRODUCTION 

Ambient  displays  convey  information  to  the  user 
without  being  very  cognitively  demanding — they  are  in  the 
background.  For  example,  the  battery  meter  icon  of  a 
computer  interface,  or  a  dangling  string  from  the  ceiling  to 
represent  network  traffie  on  a  computer  network  (Weiser  & 
Brown,  1995).  Some  important  characteristics  of  ambient 
displays  are:  useful  and  relevant  information,  sufficient 
information  design,  eonsistent  and  intuitive  mapping,  and  the 
match  between  the  system  and  the  real  world  (Mankoff,  Dey, 
Hsieh,  Kientz,  Lederer,  &  Ames,  2003).  Using  these  heuristics 
as  a  benchmark,  facial  expressions  could  be  eonsidered  a  type 
of  ambient  display.  The  purpose  of  this  study  is  to  examine  the 
ambient  quality  of  facial  expressions;  that  is  to  measure  their 
attention-demanding  qualities  when  conveying  simple 
numerical  information.  We  will  study  this  in  the  context  of 
user-system  automation  calibration. 

When  users  are  interacting  with  computerized 
decision  support  systems  or  automated  aids,  the  user  must, 
over  time,  determine  how  much  they  should  trust  the  system. 
Optimally,  the  user  would  calibrate  their  trust  to  match  the 
level  of  actual  system  reliability.  That  is,  to  be  highly  trusting 
of  a  highly  reliable  automated  system,  or  distrusting  of  a  very 
unreliable  system  (Parasuraman,  1997).  However,  this 
scenario  of  human-automation  interaction  (HAI)  can  be 
problematie  in  some  cases.  For  example,  an  operator  may 
place  too  much  trust  in  unreliable  automation,  also  known  as 
misuse  of  automation.  Conversely,  an  operator  may  not  place 
enough  trust  in  reliable  automation,  which  can  lead  to  disuse 
of  automation.  An  operator’s  misuse  or  disuse  of  automation 
is  a  function  of  their  level  of  trust,  which  is  a  byproduct  of 
their  perceptions  about  the  reliability  of  the  automation 
(Parasuraman,  1997).  The  goal  of  this  study  is  to  determine  if 
increasing  the  deducibility  and  transparency  of  trial-level 
automation  reliability  can  enhance  users  ability  to  judge 
overall  system  reliability,  and  thus  calibrate  trust.  This 
transparency  of  automation  reliability  may  allow  operators  to 
interact  with  ambient  displays  more  appropriately. 

One  plausible  way  an  automated  system  can  present 
more  transparent  information  about  its  own  reliability  is  if  the 


system  presented  its  own  confidence  in  its  recommendation. 
This  concept  can  be  categorized  in  the  ambient  display 
heuristie  of  useful  and  relevant  information.  Many  automated 
systems,  particularly  of  the  decision  support  type,  are  able  to 
present  to  the  user  their  level  of  confidence  in  the  automated 
advice.  For  example,  if  the  system  is  working  from  faulty  data, 
it  will  weigh  its  adviee  as  potentially  unreliable.  The  exchange 
of  information,  in  this  case  system  confidenee,  is  a  way  of 
diminishing  the  uneertainty  that  can  exist  in  HAI  (Bubb-Lewis 
&  Scerbo,  1997).  Tmst  is  a  malleable  variable  that  can  be 
shaped  through  interactions  with  a  system  (Antifakos,  Kem, 
Sehiele,  &  Schwaninger,  2005).  If  a  system  is  presenting  the 
operator  with  its  system  confidence  level,  then  the  operator 
should  be  able  to  build  a  more  appropriate  relationship  with 
the  automation.  Some  previous  researeh  has  indicated  that 
methods  such  as  tactile  output  or  auditory  output  may  be 
helpful  in  conveying  system  confidence  (Wisneski,  1999; 
Poupyrev,  Maruyama,  &  Rekimoto,  2002;  Sawhney  & 
Schmandt,  2000).  While  these  modalities  are  novel  in  certain 
capacities,  a  less  intrusive  and  less  attention  demanding 
modality  would  be  more  beneficial  to  users  (Antifakos,  Kem, 
Schiele,  and  Schwaninger,  2005). 

One  novel  information  presentation  format  is  the  use 
of  facial  expressions.  An  interesting  area  of  facial  expression 
research  involves  Chernoff  faces  (Chernoff,  1973).  These 
faces  were  created  to  represent  multivariate  data  in  a  way  that 
would  allow  the  viewer  to  gain  information  in  a  quick,  yet 
complete  manner.  Chernoff  (1973)  makes  a  point  that  humans 
are  accustomed  to  viewing  and  interpreting  faces.  Differences 
in  the  configuration  of  a  face,  even  small  ones,  can  be  noticed 
by  humans  (Chernoff,  1973).  If  this  statement  is  in  fact  tme, 
facial  expression  may  act  as  a  superb  source  of  information 
output.  Previous  studies  have  investigated  the  effectiveness  of 
Chernoff  faces  with  mixed  results.  A  previous  study 
concluded  that  Chernoff  faces  are  not  processed  pre- 
attentively,  and  do  not  benefit  users  more  than  other  modes  of 
visual  information  display  (Morris,  Ebert,  &  Rheingans, 

2000).  The  process  of  identifying  the  characteristics  (i.e.,  eye 
brow  slant,  eye  size,  nose  length)  of  the  Chernoff  face  was 
said  to  be  a  serial  process  (Morris,  Ebert,  &  Rheingans,  2000). 
A  similar  study  investigating  perceptual  sensitivities  for 
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Chernoff  faces  found  that  children  proeess  Chemoff  faees 
differently  from  adults  (Tsurusawa,  Goto,  Mitsudome, 
Nakashima,  &  Tobimatsu,  2008).  Children  focus  more  on 
individual  features,  while  adults  proeess  a  faee  holistically 
(Tsumsawa,  Goto,  Mitsudome,  Nakashima,  &  Tobimatsu, 
2008).  It  was  found  that  people  encode  the  meaning  of  a  face 
through  the  pereeptual  features  of  the  face.  Specifieally,  the 
eye  brows  and  mouth  are  important  for  this  encoding  (Morris, 
Ebert,  &  Rheingans,  2000;  Tsumsawa,  Goto,  Mitsudome, 
Nakashima,  &  Tobimatsu,  2008).  If  Chemoff  faces  are 
manipulated  properly,  giving  the  right  amount  of  useful 
information,  they  will  fulfill  the  heuristic  of  sufficient 
information  design  as  an  ambient  display. 

Human  Emotion  Decoding 

Research  has  shown  that  humans  are  able  to 
automatically  recognize  emotion  through  facial  expressions. 
Given  this  information,  using  facial  expressions  as  ambient 
displays  would  not  add  cognitive  load  and  would  enforce  the 
heuristic  of  consistent  and  intuitive  mapping.  Studies  have 
shown  that  tasks  involving  affective  (emotional)  stimuli  may 
be  responded  to  without  awareness  (Whalen,  1998;  Morris, 
1998).  For  example,  it  was  found  that  the  amygdala  seems  to 
have  an  automatic  response  to  facial  expressions.  Data  from 
the  fMRI  confirmed  that  participants  experienced  an  increase 
in  amygdala  activation  during  the  experiment  (Whalen,  1998). 
This  indieates  that  even  though  partieipants  were  unaware  of 
the  presentation  of  emotional  facial  expressions,  they  still 
proeessed  this  information.  The  eonclusions  of  this  study 
make  a  case  that  explieit  knowledge  is  not  necessary  for  a 
person  to  proeess  emotional  faeial  expressions.  This  process  is 
done  below  the  level  of  conseious  awareness,  or  in  other 
terms,  automatically  (Whalen,  1998). 

Neuroimaging  studies  have  supported  the  notion  that 
emotional  proeessing  of  faees  is  a  more  effeetive  pathway  than 
the  processing  of  other  stimuli.  A  previous  study  compared  the 
automatie  proeessing  of  emotional  faeial  expressions  versus 
emotional  words.  Rellecke  (201 1)  hypothesized  that  faeial 
expressions  would  be  decoded  more  automatically  than  words, 
due  to  their  pereeptual  features  and  human’s  natural  ability  to 
decode  them.  Based  on  the  results  of  emotion-related  brain 
potentials  (ERPs),  facial  expressions  were  found  to  have  a 
prolonged  effect  on  the  brain.  This  finding  alludes  to 
emotional  faeial  expression  proeessing  as  being  automated  to 
a  higher  extent  versus  emotional  word  processing  (Relleeke, 
2011).  One  point  that  this  study  also  discussed  is  how  there 
may  be  preconditions  that  are  neeessary  for  advanced 
automatie  proeessing  of  emotional  words.  The  two  stimuli 
were  tested  in  the  same  superficial  stimulus  analysis  task,  but 
only  one  (facial  expression)  led  to  advanced  automatic 
processing.  Facial  expression  seems  to  be  a  stimulus  that 
needs  no  prompting  or  preconditions  to  allow  fast,  but  also 
meaningful  processing  (Rellecke,  2011).  With  indications  that 
facial  expressions  are  a  more  effective  pathway  for  the 
decoding  of  emotional  data,  we  want  to  investigate  the  limits 
and  capabilities  of  this  potentially  new  modality  for 
information  transport. 


In  order  for  facial  expression  to  be  used  as  a  means  of 
relaying  quantitative  system/automation  information,  we  must 
know  if  users  are  able  to  properly  and  consistently  decode 
facial  expression  intensity  into  a  consistent  quantitative  value 
(e.g.,  an  intense  smiling  face  represents  90%).  Hess  (1997)  did 
a  research  study  which  investigated  the  issue  of  facial 
expression  decoding  with  varying  degrees  of  intensity  for 
different  emotional  categories.  It  was  determined  that  when 
participants  were  given  an  emotional  facial  expression 
stimulus,  they  were  accurate  at  perceiving  the  stimulus’ 
physical  intensity.  Graphically,  this  means  that  there  is  a 
positive  linear  trend  for  the  perceived  intensity  of  the 
expression  by  the  human  versus  the  actual  physical  intensity 
of  the  emotional  facial  expression  (Hess,  1997). 

Understanding  the  effects  that  different  emotional  facial 
expressions  and  their  intensities  have  on  human’s  ability  to 
decode  is  eritieal  in  determining  the  most  effective  stimuli  to 
use  as  ambient  displays. 

Age-Related  and  Cultural  Effects  on  Decoding 

Despite  the  ease  with  which  humans  are  able  to 
decode  emotional  faeial  expressions,  it  is  still  moderated  by 
age  and  cultural  aspects.  Age  can  alter  a  person’s  ability  to 
eorrectly  perceive  and  understand  the  facial  expression  that  is 
presented  before  them.  Neuropsychological  research  has 
shown  that  age-related  issues  in  faeial  expression  decoding 
may  be  a  result  of  problems  with  the  medial  temporal  lobe 
(Orgeta,  2007).  The  amygdala  is  housed  here,  which 
corroborates  previous  research  that  suggests  the  amygdala  is 
necessary  for  facial  expression  deeoding  (Whalen,  1998; 
Morris,  1 998).  There  is  an  interesting  paradox  that  has  been 
asserted  for  older  adults  involving  their  ability  to  deeode 
emotional  facial  expressions.  Aeeording  to  the  soeioemotional 
seleetivity  theory,  older  adults  are  actually  more  aware  of 
certain  emotional  situations  and  images  than  non-emotional 
(Orgeta,  2007). 

Some  studies  yielded  results  that  showed  older  adults 
as  being  more  aware  of  positive  facial  expressions,  but  not 
negative  facial  expressions  (Orgeta,  2007).  The  results  of  this 
study  indicated  that  there  is  an  age-related  difference  when 
decoding  positive  versus  negative  facial  expressions.  Orgeta 
(2007)  found  that  for  the  facial  expressions  of  sadness  and 
fear,  there  was  not  a  larger  age-effect  based  on  the  expression 
being  higher  in  perceptual  cost.  An  image  that  was  only  50  % 
expressive  did  not  show  larger  age-related  effects  than  a 
100  %  expressive  image.  This  compliments  previous  research 
because  it  indicates  that  the  major  issue  in  age-related  decline 
with  facial  expression  decoding  comes  from  cognitive  decline 
and  not  perceptual  decline  (Orgeta,  2007).  Another  issue  that 
affects  the  decoding  of  facial  expression  is  culture.  There  are 
six  basic  emotions  that  transcend  culture.  They  are:  anger, 
happiness,  fear,  surprise,  disgust,  and  sadness  (Ekman  & 
Friesen,  1975).  These  emotions  can  be  represented  with  facial 
expressions  and  are  readily  recognizable  (Lee,  2006;  Batty, 
2003).  Because  these  facial  expressions  are  not  confined  to 
specific  cultures,  it  puts  no  restraints  on  the  ability  of  different 
people  groups  to  successfully  decode  these  facial  expressions 
(Ekman  &  Friesen,  1971).  It  appears  that  increasing  age  is  a 
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factor  that  may  cause  difficulty  in  facial  expression  decoding, 
while  culture  seems  to  be  of  no  hindrance.  Due  to  facial 
expressions  prevalence  and  familiarity  in  human  culture, 
making  them  an  ambient  display  allows  the  heuristic  of 
matching  the  system  to  the  real  world  to  be  met. 

LIMITATIONS  OF  PREVIOUS  LITERATURE 

The  previous  literature  has  provided  a  foundation  for 
knowledge  about  facial  expressions,  but  there  are  limitations 
to  these  studies.  The  Hess  (1997)  study  presented  emotional 
facial  expressions  in  a  single-task  format.  The  participants 
viewed  the  image  and  rated  it  on  the  emotionality  and 
intensity  that  they  perceived.  This  methodology  does  not 
clarify  whether  facial  emotion  decoding  is  truly 
resource/attention-free  as  neuropsychological  studies  suggest. 
A  dual-task  design  should  be  implemented  to  properly 
measure  attention  usage.  In  order  to  gain  this  data,  measures 
of  response  time,  accuracy,  and  subjective  workload  should  be 
used.  The  Hess  (1997)  study  also  measured  decoding  accuracy 
for  each  facial  expression  image  through  the  presentation  of 
several  emotion  scales.  The  participant  was  presented  with 
seven  emotional  labels,  which  they  manipulated  to  show  the 
intensity  of  emotion  for  the  previous  picture.  Instead  of 
presenting  seven  individual  scales,  it  seems  to  be  less 
complicated  to  present  one  scale  or  to  have  a  quick  input 
device  (keyboard  number  keys)  after  the  image  is  viewed.  The 
Hess  (1997)  study  presented  facial  expression  intensity  in 
increments  of  20  %  intensity.  This  intensity  scale  may  not 
provide  a  complete  spectrum  of  facial  expression  decoding 
data.  The  Orgeta  (2007)  study  also  presented  only  four 
intensity  levels.  The  number  of  intensity  levels  may  need  to  be 
increased  to  capture  a  more  accurate  representation  of 
people’s  ability  to  decode  facial  expression.  Another 
limitation  in  the  Orgeta  (2007)  study  was  the  facial  images 
were  presented  in  increasing  order  as  the  participant  advanced 
through  the  experiment.  This  method  may  have  led  to 
participants  forming  an  anticipation  bias  that  the  next  facial 
image  was  going  to  be  more  expressive.  The  purpose  of  the 
current  study  is  to  examine  the  user’s  ability  to  accurately 
decode  quantitative  value  from  a  facial  expression.  Previous 
studies  have  looked  at  human’s  ability  to  properly  decode 
facial  expression  type  (Ekman  &  Friesen,  1975;  Ekman  & 
Friesen,  1971),  intensity  (Tsurusawa,  Goto,  Mitsudome, 
Nakashima,  &  Tobimatsu,  2008;  Hess  1997),  and  the 
effectiveness  of  Chemoff  faces  (Chemoff  1973;  Tsurusawa, 
Goto,  Mitsudome,  Nakashima,  &  Tobimatsu,  2008;  Morris, 
Ebert,  &  Rheingans,  2000).  However,  no  study  to  date  has 
fused  these  previously  listed  concepts  into  one  holistic  study; 
this  is  the  intent  of  the  current  study. 

OVERVIEW  OF  THE  STUDIES 

The  current  study  will  model  itself  in  some  areas 
after  Hess  (1997).  However,  our  study  will  use  the  dual-task 
paradigm  to  precisely  measure  the  attention-demanding 
characteristics  of  facial  displays.  The  current  study  will  use 
only  one  measurement  scale  (direct  key  entry)  after  each  trial 
to  eliminate  any  confusion  for  the  participants  about  what  the 


scales  are  measuring.  This  will  also  allow  for  more  precise 
response  time  data.  In  the  Orgeta  (2007)  study  the  facial 
expressions  were  shown  in  increasing  order.  Chemoff  facial 
expression  stimuli  will  be  shown  in  randomized  intensity 
order  in  an  effort  to  avoid  any  biases  being  formed  by  the 
participants.  The  Chemoff  faces  will  be  manipulated 
differently  compared  to  previous  research  (Chemoff,  1973; 
Tsumsawa,  Goto,  Mitsudome,  Nakashima,  &  Tobimatsu, 

2008;  Morris,  Ebert,  &  Rheingans,  2000).  Only  the  mouth  will 
be  manipulated  in  order  to  gain  understanding  about  the  affect 
of  this  one  variable  on  decoding.  Finally,  the  current  study 
will  use  a  facial  expression  intensity  scale  more  precise  than 
previous  research  (Hess,  1997;  Orgeta,  2007).  A  facial 
expression  scale  presenting  emotions  in  increments  of  10  % 
will  be  used.  Our  hypothesis  is  that  by  making  these 
modifications  the  current  study  will  be  able  to  address  the 
research  question  with  more  accuracy. 

METHOD 

Participants 

There  will  be  80  participants  (40  younger  adults,  40 
older  adults)  tested  for  the  current  study.  The  age  range  for 
younger  adults  will  be  18-24  years  old,  while  the  age  range  for 
older  adults  will  be  from  65-85  years  old. 

Design 

This  study  will  be  a  between-subjects,  2  (age  group) 

X  2  (facial  expression  type)  x  10  (facial  expression  intensity) 
factorial  design.  The  dependent  variables  being  measured  are: 
the  speed  (ms)  for  the  block  task,  the  proficiency  on  the  block 
task  (amount  of  blocks  cleared),  the  speed  (ms)  of  response  on 
the  facial  expression  task,  the  amount  of  “misses”  on  the  facial 
expression  task,  and  the  accuracy  of  response  for  the  facial 
expression  rating.  Measures  of  subjective  workload  will  be 
collected  with  the  NASA-TLX  and  individual  cognitive  ability 
data  will  be  collected  with  a  battery  of  cognitive  abilities  tests. 

Task  and  Materials 

Participants  will  view  the  program  on  19-inch  LCD 
monitors  and  make  all  responses  using  the  keyboard.  They 
will  be  seated  in  office  chairs  about  18-24  inches  from  the 
screen  in  an  office  environment.  Participants  will  initially  take 
a  computerized  cognitive  abilities  test.  These  tests  include  the 
digit  symbol  test,  reverse  digit  span  test,  and  the  Shipley 
vocabulary  test.  These  tests  will  gather  information  on 
individual  abilities  such  as  working  memory,  perceptual 
speed,  and  vocabulary. 

The  primary  task  will  be  to  play  a  block  game  similar 
to  the  game  Tetris.  The  block  game  consists  of  moving  multi¬ 
colored  blocks.  The  main  objective  of  the  block  task  is  to 
manipulate  the  blocks,  and  successfully  clear  them  using  the 
arrow  keys  and  space  bar.  The  blocks  appear  on  the  screen 
(moving  from  bottom  to  top)  as  the  participant  interacts  with 
the  program. 
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The  secondary  task  will  be  to  identify  the  level  of 
emotion  presented  on  a  computer-generated  character.  The 
facial  expression  stimuli  were  rendered  using  the  statistical 
program  R.  This  allowed  the  experimenter  to  have  systematic 
control  over  the  faces  and  increase  their  facial  expression 
intensity  as  desired.  The  facial  expression  stimuli  are  basic 
line  drawings  composed  of  black  lines  on  a  white  background. 
This  was  done  to  eliminate  any  confounding  variables  that 
could  appear  due  to  gender,  ethnicity,  or  age.  There  are  19 
stimuli  total:  9  happy  stimuli  (ranging  from  10%  expressive  — 
90%  expressive),  9  sad  stimuli  (ranging  from  10%  expressive 
-  90  %  expressive),  and  one  neutral  stimulus  (0  %  expressive). 
The  dimensions  of  the  stimuli  are  170  pixels  by  250  pixels. 


Figure  1.  50  %  happy  Chemoff  face. 

Procedure 

Participants  will  be  randomly  assigned  to 
experimental  conditions  prior  to  the  experiment.  The 
participants  will  be  given  an  informed  consent  document 
before  any  testing  is  conducted.  The  participant  will  then  take 
a  battery  of  cognitive  abilities  tests.  Next,  the  experiment  will 
be  presented  in  three  phases.  The  participants  will  complete 
two  separate  single-tasks  (block  task  and  facial  expression 
task)  to  record  baseline  data  on  their  abilities  in  each  task.  To 
examine  the  attentional  demands  of  decoding  Chernoff  faces, 
participants  will  then  engage  in  a  dual-task.  The  primary  task 
will  be  the  block  task.  This  spatial-manipulation  task  will  be 
relatively  cognitively  taxing  on  the  participants.  The 
secondary  task  will  be  the  facial  expression  task.  This  task  will 
presumably  be  fairly  automatic  for  the  participants  and  will 
require  little  to  no  cognitive  resources. 

In  phase  1,  the  participant  will  perform  the  block 
task  in  a  single-task  environment.  The  participant  will  have  to 
reach  a  pre-set  score  (based  on  number  of  successful 
manipulations)  to  complete  the  task.  In  phase  2  of  the 
experiment,  the  participants  will  be  asked  to  respond  to  facial 
expressions  that  are  flashed  on  the  computer  screen.  The 
participant  will  be  in  one  of  two  facial  expression  conditions 
(happy  or  sad).  Once  phase  2  begins,  the  facial  expression  will 
appear  in  the  window  for  three  to  five  seconds  and  then 
disappear.  The  facial  expressions  will  be  shown  in  a 
randomized  order  in  regard  to  their  intensity  level.  During  this 
time  interval  the  participant  will  try  to  respond  to  the  facial 
expression  using  the  number  keys.  If  the  participant  does  not 
hit  a  number  key  before  time  has  elapsed  then  a  “miss”  will  be 
recorded.  Regardless  of  whether  the  participant  has  responded 
or  missed  making  a  response,  after  three  to  five  seconds 
(randomized  appearance  time)  the  screen  will  go  back  to  being 
blank  until  the  next  trial.  There  will  be  60  trials  in  each 
condition.  In  phase  3,  the  participant  will  be  exposed  to  both 


phases  1  and  2  simultaneously.  This  will  create  a  dual-task 
situation.  After  the  participant  has  completed  the  experiment 
the  computer  will  display  the  NASA-TLX  questionnaire  for 
completion. 

PREDICTED  RESULTS 

The  first  hypothesis  (Hia)  is  that  participant’s 
performance  (i.e.,  accuracy  and  speed)  will  be  above  chance 
levels  for  facial  emotion  decoding  in  the  single-task  phase.  We 
are  assuming  that  the  younger  and  older  participants  will  be 
able  to  rely  on  previously  acquired  innate  facial  expression 
knowledge  to  achieve  high  accuracy  decoding.  We  expect  to 
see  a  linear  trend  between  the  actual  intensity  of  the  emotional 
facial  expression  presented  and  the  perceived  intensity  of  the 
emotional  facial  expression.  This  hypothesis  is  based  on  the 
results  of  Hess  (1997)  and  Orgeta  (2007).  The  second  part  of 
our  hypothesis  (Hib)  is  that  all  participant’s  performance  on 
the  facial  decoding  task  will  be  affected  due  to  the  dual-task 
environment.  As  a  consequence  of  divided  attention,  we 
expect  facial  expression  response  time  and  misses  to  increase 
in  the  dual-task  environment.  However,  the  current  study 
hypothesizes  that  one  condition  will  present  itself  as  more 
decodable  to  the  participant  in  the  dual-task.  This  is  based  on 
the  supposed  automatic  nature  of  the  facial  expression  task 
and  also  the  effects  of  facial  expression  type  on  decoding.  If 
this  modality  is  actually  resource-free  and  allows  for  ease  of 
decoding  as  some  studies  indicate  (Whalen,  1998;  Lee,  2006; 
Morris,  1998),  then  dual-task  performance  should  not 
significantly  deviate  from  single-task  performance  for  the 
participant  when  the  most  effective  facial  expression  type  is 
presented,  which  we  hypothesize  to  be  the  happy  condition. 

The  second  hypothesis  (H2)  is  that  all  participants 
will  show  a  difference  in  facial  recognition  accuracy  scores 
between  the  two  conditions  (i.e.,  happy  and  sad).  We  are 
expecting  a  main  effect  for  condition.  It  is  hypothesized  that 
participants  will  have  larger  actual  versus  perceived 
differences  (i.e.,  worse  facial  expression  recognition)  for  the 
sad  condition.  This  hypothesis  is  drawn  from  the 
socioemotional  selectivity  theory  and  research  which  supports 
positive  expressions  as  more  identifiable,  referred  to  as  the 
“happy  face  advantage”.  (Orgeta,  2007;  Ekman  &  Friesen, 
1971). 

The  third  hypothesis  (H3)  is  that  the  variables  of  age 
and  condition  will  have  a  significant  effect  on  the  accuracy  of 
the  participants.  We  are  expecting  a  two-way  interaction 
between  age  and  emotion  of  presented  face  on  accuracy.  Thus, 
when  older  adults  are  in  the  happy  condition  their  accuracy 
scores  will  not  be  significantly  different  (i.e.,  differences  in 
actual  and  perceived  facial  expression  intensities)  than 
younger  adults  in  the  happy  condition.  However,  we  expect  to 
see  younger  adults  produce  significantly  better  performance 
scores  in  the  sad  condition  versus  the  older  adults  in  the  sad 
condition.  This  hypothesis  is  driven  by  the  socioemotional 
selectivity  theory.  It  is  expected  that  older  adults  will  have 
significantly  better  performance  in  the  happy  condition  than  in 
the  sad  condition,  while  younger  adults  will  yield  less 
significant  performance  differences  between  the  conditions. 
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DISCUSSION  REFERENCES 


The  current  research  study  is  attempting  to  clarify  the 
issue  of  whether  emotion  can  be  used  as  a  reliable  and 
resource-free  modality.  This  research  is  relevant  not  just  for 
applied  psychology,  but  also  for  the  pool  of  knowledge  in 
psychology  used  for  future  studies.  The  potential  importance 
of  this  research  will  continue  to  increase  as  the  use  of  ambient 
displays,  automation,  and  system  confidence  increases  in  our 
society.  The  ability  for  facial  expression  to  be  used  in 
automation  as  a  viable  communication  tool  may  be  similar  to 
that  of  the  visual  and  spatial  modalities  presented  in  multiple 
resource  theory  (Lee,  2006).  If  the  current  study  discovers  that 
the  emotional  modality  is  capable  of  reliable  and  efficient 
information  processing  during  a  dual-task  situation,  then  the 
concept  of  emotionally  transparent  automation  may  be 
implemented  in  future  automation.  One  of  the  main  goals  of 
HAI  is  easily  understood  output  from  the  automation  for  the 
human  to  interpret.  This  allows  the  human  to  understand 
automation  behavior  and  predict  future  behavior.  It  has  been 
noted  that  this  process  can  be  hindered  by  advanced 
automation.  To  help  alleviate  this  disconnect  between  the 
automation  and  human,  the  use  of  facial  expressions  could 
bring  interpretation  clarity  for  the  human  (Lee,  2006).  This 
clarity  would  allow  for  properly  calibrated  trust  to  be  formed 
between  the  human  and  automation,  and  ultimately  allow  the 
human  to  have  a  realistic  idea  of  the  system’s  confidence  and 
interact  with  it  accordingly.  This  is  important  because 
automation  that  is  assisting  in  critical  situations  (heath 
management,  aviation,  nuclear  power  plants,  etc.)  needs  to  be 
trusted  and  used  properly  by  the  user.  Widening  the  research 
question  back  out  to  ambient  displays,  it  is  evident  that  many 
domains  could  benefit  from  research  explaining  how  facial 
expressions  aid  in  information  display.  One  interesting  point 
proposed  by  Lee  (2006)  is  the  lack  of  attention  that  is  required 
for  emotional  stimuli  processing.  The  current  study  is  trying  to 
build  on  this  idea  and  show  that  the  use  of  facial  expressions 
to  deliver  information  requires  almost  no  attention  from  the 
human.  This  finding  would  give  evidence  that  human’s  are 
already  equipped  with  a  resource-free  modality  that  can  be 
used  to  gain  information.  One  potential  benefit  of  an  innate 
modality  for  information  processing  would  be  the  little  to  no 
training  required  for  people  to  properly  access  this  tool.  Due  to 
the  large  variety  of  users  for  most  systems,  implementing 
effective  ambient  displays  can  be  difficult.  However,  if  facial 
expression  decoding  proves  to  be  an  effective  information 
processing  method,  then  it  could  be  critical  to  making  ambient 
displays  successful  across  demographic  categories.  The 
current  study  could  be  used  to  show  a  unifying  aspect  of 
human  information  processing  that  could  be  applied  to 
research  in  multiple  disciplines.  In  sum,  the  current  study  may 
find  that  the  key  to  creating  a  viable  ambient  display  is  found 
within  the  human  brain.  To  capitalize  on  this  fact,  ambient 
displays  should  be  designed  to  display  emotional  facial 
expressions  to  take  advantage  of  this  untapped  modality. 
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ABSTRACT 


Ambient  displays  are  used  to  provide  information  to  users  in  a  non-distracting 
manner.  The  purpose  of  this  research  was  to  examine  the  efficacy  of  facial  expressions  as 
a  method  of  conveying  information  to  users  in  an  unobtrusive  way.  Facial  expression 
recognition  requires  very  little  if  any  conscious  attention  from  the  user,  which  makes  it  an 
excellent  candidate  for  the  ambient  presentation  of  information.  Specifically,  the  current 
study  quantified  the  amount  of  attention  required  to  decode  and  recognize  various  facial 
expressions.  The  current  study  assessed  the  attention-demanding  characteristics  of  facial 
expressions  using  the  dual-task  experiment  paradigm.  Results  from  the  experiment 
suggest  that  Chernoff  facial  expressions  are  decoded  with  the  most  accuracy  when  happy 
facial  expressions  are  used.  There  was  also  an  age-effect  on  decoding  accuracy; 
indicating  younger  adults  had  higher  facial  expression  decoding  performance  compared 
to  older  adults.  The  observed  decoding  advantages  for  happy  facial  expressions  and 
younger  adults  in  the  single-task  were  maintained  in  the  dual-task.  The  dual-task 
paradigm  revealed  that  the  decoding  of  Chernoff  facial  expressions  required  more 
attention  (i.e.,  longer  response  times  and  more  face  misses)  than  hypothesized,  and  did 
not  evoke  attention-free  decoding.  Chernoff  facial  expressions  do  not  appear  to  be  good 
ambient  displays  due  to  their  attention-demanding  nature. 
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INTRODUCTION 


Ambient  displays  can  take  many  forms.  For  example,  the  battery  meter  icon  of  a 
computer  interface,  or  a  dangling  string  from  the  ceiling  to  represent  network  traffic  on  a 
computer  network  (Weiser  &  Brown,  1995).  These  examples  are  considered  “ambient” 
because  they  convey  information  to  the  user  without  being  substantially  taxing  on 
cognitive  faculties  (i.e.,  they  are  in  the  background  and  do  not  require  the  user  to  change 
focus  or  switch  attention).  Several  important  characteristics  have  been  identified  for  the 
design  of  a  good  ambient  display.  Examples  of  these  characteristics  include:  providing 
useful  and  relevant  information,  having  a  sufficient  information  design,  using  consistent 
and  intuitive  mapping,  and  appropriate  matching  between  the  system  and  the  real  world 
(Mankoff,  Dey,  Hsieh,  Kientz,  Lederer,  &  Ames,  2003).  If  these  characteristics  are 
adequately  fulfilled  by  facial  expressions,  then  facial  expressions  could  be  considered  a 
good  form  of  ambient  display.  The  purpose  of  this  study  is  to  determine  if  face  stimuli 
can  serve  as  ambient  indicators  of  quantitative  information. 

One  situation  where  ambient  displays  may  be  helpful  is  in  human-automation 
interaction  (HAI).  In  some  HAIs,  users  may  become  unaware  of  the  hidden  decision 
making  processes  or  outcomes  of  automation.  They  may  also  lose  track  of  the 
automation’s  reliability  over  time  (i.e.,  forget  how  reliable  or  unreliable  it  has  been  in  the 
past).  Such  information  (uncertainty  of  current  processes,  past  reliability)  can  lead  to 
fluctuations  in  trust  that  may  not  be  justified  (un-calibrated  trust);  that  is  trust  that  may  be 
unwarranted.  Un-calibrated  trust  can  manifest  itself  as  continued  use  of  unreliable 


1 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


automation  (misuse)  or  unwarranted  discontinued  use  of  reliable  automation  (disuse)  both 
of  which  cause  non-optimal  HAIs  (Parasuraman,  1997). 

One  way  in  which  an  automated  system  can  encourage  proper  calibration  is  by 
presenting  as  much  information  about  its  operation  as  possible.  For  example,  it  could 
present  its  own  confidence  in  its  recommendation,  so  called  “system  confidence”,  or  it 
could  present  a  historical  picture  of  its  own  reliability  (both  are  information  that  are 
easily  accessible  by  a  system).  This  concept  can  be  categorized  in  the  ambient  display 
heuristic  of  useful  and  relevant  information.  For  example,  if  the  system  is  working  from 
faulty  data,  it  will  weight  its  advice  as  potentially  unreliable.  Presenting  critical 
information,  such  as  system  confidence,  is  a  way  of  diminishing  the  uncertainty  that  can 
exist  in  HAIs  (Bubb-Lewis  &  Scerbo,  1997).  Trust  is  a  malleable  variable  that  can  be 
shaped  through  interactions  with  a  system  (Antifakos,  Kern,  Schiele,  &  Schwaninger, 
2005).  If  a  system  is  presenting  the  operator  with  its  system  confidence  level,  then  the 
operator  will  be  able  to  build  a  more  appropriate  trust  relationship  with  the  automation. 
However,  this  presentation  needs  to  be  salient  and  the  automation  state  indicator  should 
not  add  attentional  demands  to  the  user  (Parasuraman,  1997).  Some  previous  research  has 
indicated  that  methods  such  as  tactile  output  and  auditory  output  may  be  helpful  in 
conveying  system  confidence  (Wisneski,  1999;  Poupyrev,  Maruyama,  &  Rekimoto, 

2002;  Sawhney  &  Schmandt,  2000).  While  these  modalities  are  novel  in  certain 
capacities,  a  less  intrusive  and  less  attention  demanding  modality  would  be  more 
beneficial  to  users.  Thus,  the  ideal  stimulus  display  type  would  be  one  that  provides  the 
user  with  meaningful  information,  while  not  becoming  a  distraction  or  a  drain  on  the 
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user’s  attention  (Antifakos,  Kern,  Schiele,  and  Schwaninger,  2005).  Coding  information 
as  emotional  expression  in  human-like  faces  may  fulfill  this  role. 

Human  Emotion  Decoding 

Research  has  shown  that  humans  have  an  ability  to  recognize  emotional  facial 
expressions  with  little  attention  allocation.  Batty  and  Taylor  (2003)  had  participants 
complete  an  implicit  emotional  task,  which  involved  the  presentation  of  target  stimuli 
(non-faces)  in  a  sequence  with  emotional  faces.  This  experimental  design  allowed  the 
researchers  to  test  the  participants’  event-related  potentials  (ERPs)  while  viewing 
emotional  faces,  but  without  explicitly  instructing  the  participant  to  look  at  the  emotional 
faces.  Through  analysis  of  the  ERPs,  it  was  found  that  participants  were  processing  the 
emotional  face  stimuli  quickly  (i.e.,  M  =  94  ms  for  PI  component;  M  =  140  ms  for  N 170 
component).  The  results  of  this  analysis  of  the  PI  and  N170  components  suggest  that 
participants  were  processing  the  emotional  face  stimuli  pre-attentively  (Batty  &  Taylor, 
2003).  Other  studies  have  supported  that  tasks  involving  affective  (emotional)  stimuli 
may  be  responded  to  without  awareness  (Whalen,  1998).  An  fMRI  study  showed  that 
participants  experienced  increased  amygdala  activation  even  when  they  were  unaware  of 
the  presentation  of  emotional  facial  expressions  (Whalen,  1998).  The  amygdala  is  a  key 
area  of  the  brain  for  the  emotional  facial  recognition  process.  Previous  research  on 
animals  has  provided  evidence  that  the  amygdala  is  the  brain  area  where  facial  and 
emotional  processing  occurs.  A  subsequent  study  built  off  of  these  findings  and  found  the 
amygdala  was  crucial  for  humans’  decoding  of  facial  affect,  especially  the  emotion  of 
fear  (Adolphs,  Tranel,  Damasio,  &  Damasio,  1994).  The  conclusions  of  Whalen  (1998) 
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make  a  case  that  explicit  knowledge  is  unnecessary  for  a  person  to  process  emotional 
facial  expressions.  This  process  occurs  below  the  level  of  conscious  awareness,  or  in 
other  terms,  automatically  (Morris,  1998;  Whalen,  1998).  It  can  be  inferred  from  these 
studies,  that  the  use  of  facial  expressions  as  ambient  displays  should  not  add  cognitive 
load  and  would  enforce  the  heuristic  of  consistent  and  intuitive  mapping. 

Neuroimaging  studies  have  supported  the  notion  that  the  emotional  processing  of 
faces  is  a  more  effective  pathway  than  the  processing  of  other  stimuli.  A  previous  study 
compared  the  automatic  processing  of  emotional  facial  expressions  versus  emotional 
words.  Rellecke  (201 1)  hypothesized  that  facial  expressions  would  be  encoded  more 
automatically  than  words,  due  to  their  perceptual  features  and  humans’  natural  ability  to 
encode  them.  This  study  was  novel  because  it  took  two  theoretically  attention-free 
emotional  processing  stimuli  (i.e.,  faces  and  words),  and  compared  their  efficiency  and 
effect.  The  degree  of  encoding  automaticity  was  being  tested  for  each  of  these  stimuli. 
Based  on  the  results  of  the  electroencephalogram  (EEG),  the  event-related  brain 
potentials  (ERPs)  recorded  for  the  facial  expression  conditions  were  found  to  have  a 
prolonged  effect  on  the  brain.  This  finding  alludes  to  emotional  facial  expression 
processing  as  being  automated  to  a  higher  extent  than  emotional  word  processing. 
Rellecke  (201 1)  discusses  the  potential  necessity  for  preconditions  for  the  high  automatic 
processing  of  emotional  words.  This  was  apparent  because  the  two  stimuli  were  tested  in 
the  same  superficial  stimulus  analysis  task,  but  only  one  (i.e.,  facial  expression)  led  to 
advanced  pre-attentive  processing.  Eacial  expression  seems  to  be  a  stimulus  that  needs  no 
prompting  or  preconditions  to  allow  fast,  but  also  meaningful  processing  (Rellecke, 
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201 1).  Data  analysis  found  that  happy  faces  were  decoded  earlier  than  other  faces  (i.e., 
50-100  ms).  This  supports  the  theory  that  happy  faces  are  advantageous  in  the  early 
stages  of  emotional  processing  and  may  be  instrumental  in  attention-free  encoding.  Also, 
data  showed  that  angry  faces  were  advantageous  for  later  decoding  (i.e.,  150-450  ms). 
This  coincides  with  previous  research  that  states  angry  expressions,  or  threat-related 
expressions,  have  prolonged  effects  on  the  brain  (Rellecke,  2011).  These  differences  in 
emotion  type  on  ERPs  show  that  there  may  be  a  specific  type  of  emotion  that  elicits  faster 
decoding  for  humans. 

Calvo  and  Lundqvist  (2008)  found  the  facial  expression  of  happiness  to  be  the 
stimuli  best  decoded  by  participants.  Participants  were  presented  with  a  happy  facial 
expression  and  responded  more  accurately  in  its  identification,  and  rarely  mis-identified 
the  expression  as  another  emotion  (i.e.,  neutral,  angry,  sad,  disgusted,  surprised,  fearful). 
Response  times  for  neutral  and  happy  facial  expressions  were  the  fastest  among  all 
expressions.  This  indicates  a  fast,  automatic  form  of  facial  expression  decoding.  Calvo 
and  Lundqvist  (2008)  conducted  a  second  experiment  where  the  participants  were 
exposed  to  the  stimuli  in  a  “fixed-pace  mode”.  Participants  viewed  the  stimuli  at  fixed 
exposures  of  25,  50,  100,  250,  and  500  milliseconds.  The  results  of  this  experiment 
paralleled  the  original  findings,  showing  that  the  expression  of  happiness  was 
consistently  identified  at  a  high  accuracy  level  {M  =  98.4%)  regardless  of  the  exposure 
time.  Having  additional  time  to  decode  the  happy  expression  did  not  result  in  accuracy 
gains.  Thus,  it  can  be  inferred  that  humans  are  very  quick  and  accurate  at  decoding  happy 
facial  expressions.  With  indications  that  facial  expressions  are  an  effective  pathway  for 
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the  decoding  of  emotional  data,  we  want  to  investigate  the  limits  and  capabilities  of  this 
potentially  new  modality  for  communication  of  quantitative  information. 

In  order  for  facial  expression  to  be  used  as  a  means  of  relaying  quantitative 
system/automation  information,  we  must  know  if  users  are  able  to  properly  and 
consistently  decode  facial  expression  intensity  into  a  consistent  quantitative  value  (e.g.,  a 
specific  smiling  face  represents  90%).  Hess  (1997)  investigated  the  issue  of  facial 
expression  decoding  with  varying  degrees  of  intensity  for  different  emotional  categories. 
When  participants  were  given  an  emotional  facial  expression  stimulus,  they  were 
accurate  at  perceiving  its  physical  intensity;  there  was  a  linear  trend  for  the  perceived 
intensity  of  the  expression  by  the  human  versus  the  actual  physical  intensity  of  the 
emotional  facial  expression  (Hess,  1997).  Analysis  showed  that  when  a  facial  expression 
was  more  intense  (e.g.,  80%  and  100%  expressive)  the  participant  had  a  more  accurate 
perception  of  the  emotional  stimulus.  Happy  expressions  were  the  most  recognizable 
across  all  intensity  levels  (Hess,  1997).  This  finding  supports  happy  facial  expressions  as 
one  of  the  most  familiar  and  perhaps  easiest  of  facial  expressions  to  decode  for  humans. 
Bartneck  and  Reichenbach  (2005)  performed  a  similar  study  that  sought  to  determine 
how  the  actual  intensity  of  facial  stimuli  affected  perceived  intensity  and  accuracy.  It  was 
found  that  participants  displayed  high  accuracy  in  perceiving  happy  face  intensity,  high 
recognition  accuracy  for  happy  faces,  and  gave  low  task  difficult  ratings  for  happy  faces. 
It  was  also  found  that  the  happy  facial  expressions  led  to  the  fastest  ceiling  effect  for 
recognition  accuracy.  Participants  were  able  to  recognize  the  happy  facial  expression 
starting  at  just  10%  intensity.  This  reiterates  quick  decoding  for  happy  facial  expressions. 
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Understanding  the  effects  that  different  emotional  facial  expressions  and  their  intensities 
have  on  humans’  ability  to  decode  is  critical  in  determining  the  most  effective  stimuli  to 
use  as  ambient  displays. 

Chernojf  Faces 

Chernoff  faces  were  created  to  represent  multivariate  data  in  a  way  that  would 
allow  the  viewer  to  gain  information  in  a  quick,  yet  complete  manner.  For  example,  some 
of  the  original  Chernoff  faces  were  used  to  represent  fossil  data.  The  Chernoff  faces 
displayed  information  pertinent  to  the  fossils  (i.e.,  inner  diameter  of  embryonic  chamber, 
total  number  of  whorls,  maximum  height  of  chambers  in  last  whorl,  etc.)  through 
variations  including,  but  not  limited  to  the  faces:  head  shape,  eye  size,  mouth  size/shape, 
and  eyebrow  size/slant.  Chernoff’ s  rationale  was  that  due  to  the  extreme  familiarity  of 
faces,  people  would  easily  detect  differences  in  the  configuration  of  a  face,  even  if  the 
differences  were  small  ones  (Chernoff,  1973).  It  was  expected  that  people  would  at  least 
be  able  to  examine  faces  more  quickly  than  examining  a  row  of  numbers.  Assuming  that 
this  is  true,  a  schematic  facial  expression  should  act  as  a  superb  source  of  information 
output. 

Chernoff  faces  have  up  to  18  characteristics  that  can  be  manipulated  (Nelson, 
2007).  When  representing  multivariate  data  (e.g.,  the  fossil  data)  it  is  beneficial  to  have 
multiple  facial  elements  that  can  be  manipulated  and  used  for  representing  various  data. 
However,  when  representing  univariate  data  (i.e.,  a  single  percentage  score)  it  seems  that 
having  a  lower  number  of  manipulated  facial  features  is  more  beneficial.  Therefore,  it 
could  be  problematic  to  have  several  individual  facial  elements  for  the  human  to  properly 
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decode.  If  a  human  naturally  decodes  a  face  as  a  whole  rather  than  in  parts;  it  may  be 
counter-intuitive  to  present  them  with  a  face  that  requires  the  decoding  of  several  features 


(parts)  of  the  face.  As  Montello  and  Gray  (2005)  state,  it  is  more  beneficial  to  have  a 
stimulus  that  communicates  information  univariately  rather  than  multivariately  when  the 
goal  is  to  give  the  user  a  single  quantity.  A  pseudo-Chernoff  face  may  be  a  remedy  for 
this  dilemma  (Montello  &  Gray,  2005).  This  “pseudo-Chernoff’  face  could  be  created  by 
systematically  manipulating  one  facial  characteristic,  while  holding  all  others  constant. 

To  properly  convey  a  simple  quantitative  score  the  Chernoff  face  may  only  need  to  have 
one  facial  characteristic  manipulated.  Through  this  manipulation,  the  human  may  be 
more  apt  to  decode  the  Chernoff  face  accurately  and  quickly,  while  noticing  subtle 
changes  (Kabulov,  1992). 

The  issue  of  whether  interpreting  Chernoff  faces  is  a  relatively  less  attention¬ 
demanding  task  is  of  primary  importance  to  the  current  study.  Previous  studies  have 
investigated  the  effectiveness  of  Chernoff  faces  as  a  pre-attentive  stimulus  with  mixed 
results.  A  study  concluded  that  Chernoff  faces  are  not  processed  pre-attentively,  and  do 
not  benefit  users  more  than  other  modes  of  visual  information  display  (Morris,  Ebert,  & 
Rheingans,  2000).  The  process  of  identifying  the  characteristics  (eyebrow  slant,  eye  size, 
nose  length)  of  the  Chernoff  face  was  said  to  be  a  serial  process.  Participants’  accuracy  of 
target  stimuli  identification  improved  when  they  were  given  more  time  and  less 
distracters,  indicating  that  the  task  was  not  pre-attentive  (Morris,  Ebert,  &  Rheingans, 
2000).  A  similar  study  investigated  data  visualization  and  used  Chernoff  faces  as  one  of 
the  “glyph  stimuli”  to  discover  which  data  visualizations  were  the  most  effective  (Eee, 
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Reilly,  &  Butavicius,  2003).  Glyphs  are  data  visualizations  that  are  characterized  by  their 
attempt  to  display  multivariate  data  through  the  manipulation  of  features  on  the  glyph 
that  correspond  to  raw  data.  It  was  found  that  participants  had  lower  accuracy  scores  and 
took  longer  to  answer  questions  when  exposed  to  the  glyph  stimuli  (Lee,  Reilly,  & 
Butavicius,  2003).  This  indicates  a  serial  processing  of  information  from  the  Chernoff 
faces,  which  is  in  agreement  with  the  findings  of  Morris,  Ebert,  &  Rheingans  (2000). 

A  study  investigating  perceptual  sensitivities  found  that  children  process  Chernoff 
faces  differently  than  adults  (Tsumsawa,  Goto,  Mitsudome,  Nakashima,  &  Tobimatsu, 
2007).  Children  focus  more  on  individual  features,  while  adults  process  a  face  in  a  more 
holistic  pattern.  These  findings  seem  to  be  discrepant  with  the  previously  mentioned 
studies.  Perhaps  adults  do  not  decode  Chernoff  faces  to  the  degree  of  serial  processing  as 
suggested  by  other  studies.  If  adults  decode  in  a  faster  more  parallel  manner,  then 
Chernoff  faces  may  allow  for  pre-attentive  processing.  Of  particular  interest  is  how  the 
participants  differed  on  their  interpretation  of  the  mouth  angle  presented.  Children 
significantly  differed  from  adults  in  their  evaluation  of  the  Chernoff  face  as  a  function  of 
the  angle  of  the  stimuli’s  mouth.  Children  evaluated  the  faces  as  more  emotional  as  the 
curvature  of  the  mouth  changed,  while  the  adults  were  significantly  below  the  children’s 
evaluation  score.  Supposedly,  this  is  a  consequence  of  children’s  lack  of  holistic  face 
processing  ability  (Tsurusawa,  Goto,  Mitsudome,  Nakashima,  &  Tobimatsu,  2007).  An 
additional  finding  bolstered  Chernoff  faces’  potential  value  as  a  quantitative  display.  This 
was  the  participants’  ability  to  evaluate  the  stimuli  in  discrete  steps  (Tsurusawa,  Goto, 
Mitsudome,  Nakashima,  &  Tobimatsu,  2007).  Basically,  participants  could  follow  the 
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incremental  facial  feature  changes  in  the  Chernoff  faces;  similar  to  the  hypothesis  by 
Chernoff  (1973).  Although  children  and  adults  may  process  Chernoff  faces  differently,  it 
can  be  inferred  that  Chernoff  faces  can  demonstrate  human  facial  expressions  effectively. 

A  previous  study  used  schematic  faces  (line  faces  similar  to  Chernoff  faces)  as 
stimuli  to  determine  whether  the  “anger  superiority  effect”  was  apparent  while  using  a 
visual  search  paradigm  (Ohman,  Lundqvist,  &  Esteves,  2001).  The  study  found 
schematic  faces  to  be  identified  quickly  and  accurately,  with  schematic  faces  representing 
anger/threatening  emotion  leading  to  the  most  pre-attentive  reaction  times.  The  visual 
search  paradigm  was  reconfigured  throughout  the  experiment  by  adding  more  distractor 
stimuli.  This  was  done  in  an  effort  to  make  a  more  difficult  visual  search  task,  which 
would  test  for  serial  versus  parallel  search.  Following  each  of  these  iterations,  the 
threatening  facial  expression  was  shown  to  be  the  most  decodable  (faster  and  more 
accurate)  stimuli  (Ohman,  Lundqvist,  &  Esteves,  2001).  This  is  important  because  it 
indicates  that  the  threatening  schematic  face  is  processed  in  parallel,  or  without  using 
much  attention.  The  results  of  this  study  show  that  schematic  faces  can  be  processed  in 
parallel  and  that  there  is  potentially  an  “anger  superiority  effect”  for  these  types  of  stimuli 
(Ohman,  Lundqvist,  &  Esteves,  2001). 

If  Chernoff  faces  are  manipulated  properly,  giving  the  right  amount  of  useful 
information,  they  will  fulfill  the  heuristic  of  sufficient  information  design  as  an  ambient 
display.  To  reiterate,  the  main  issue  concerning  Chernoff  faces  is  whether  they  can  be 
interpreted  pre-attentively,  with  minimal  attentional  resources.  Once  this  issue  is 
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understood  with  more  clarity,  the  efficacy  of  facial  expressions  in  the  form  of  Chemoff 
faces  to  be  ambient  displays  will  be  evident. 

Age-Related  and  Cultural  Effects  on  Decoding 

Despite  the  ease  with  which  humans  are  able  to  decode  emotional  facial 
expressions,  it  is  still  moderated  by  age.  Age  can  alter  a  person’s  ability  to  correctly 
perceive  and  understand  the  facial  expression  that  is  presented  to  them. 
Neuropsychological  research  has  shown  that  age-related  issues  in  facial  expression 
decoding  may  be  a  result  of  problems  with  the  medial  temporal  lobe  (Orgeta  &  Phillips, 
2007).  The  amygdala  is  housed  here,  which  corroborates  with  previous  research  that 
suggests  the  amygdala  is  necessary  for  facial  expression  decoding  (Whalen,  1998; 

Morris,  1998).  Despite  these  age-related  issues;  a  competing  theory  has  been  asserted 
regarding  older  adult’s  ability  to  decode  emotional  facial  expressions.  The 
socioemotional  selectivity  theory  asserts  that  social  behavior  is  essentially  a  byproduct  of 
time  (Carstensen,  Issacowitz,  &  Charles,  1999).  In  a  sense,  time  can  be  thought  of  as  the 
chronological  age  of  a  human.  As  the  human  ages,  they  essentially  have  less  time  to  live 
and  fulfill  goals.  This  affects  the  way  they  view  their  decisions  and  weight  their  goals. 
The  two  types  of  goals  that  make  up  the  socioemotional  selectivity  theory  are  knowledge- 
based  and  emotion-based  goals  (Carstensen,  Issacowitz,  &  Charles,  1999).  Younger 
adults  are  more  likely  to  pursue  knowledge -based  goals  because  they  have  more  time 
potential.  The  trade  off  for  knowledge  in  lieu  of  emotional  goals  appears  to  be  a  worthy 
endeavor.  Older  adults  supposedly  take  the  opposite  approach  and  view  emotional-based 
goals  as  top  priority.  Older  adults’  view  time  as  a  non-renewable  resource,  and  seek  to 
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spend  anytime  they  have  left  enjoying  positive  emotional  experiences  (Carstensen, 
Issacowitz,  &  Charles,  1999). 

According  to  the  socioemotional  selectivity  theory,  older  adults  may  actually  be 
more  aware  of  certain  emotional  situations  and  images  than  non-emotional  (Orgeta  & 
Phillips,  2007).  Orgeta  and  Phillips  (2007)  showed  older  adults  as  being  more  accurate  at 
identifying  positive  facial  expressions,  opposed  to  negative  facial  expressions.  Older 
adults  were  found  to  identify  positive  emotions  as  accurately  as  younger  adults.  There 
was  no  significant  difference  between  the  older  adults  and  younger  adults  in  terms  of 
identifying  positive  facial  emotions  (i.e.,  happiness  and  surprise).  However,  older  adults 
were  significantly  worse  than  younger  adults  at  identifying  negative  facial  emotions  (i.e., 
sadness,  anger,  and  fear).  The  results  of  this  study  indicated  that  there  is  an  age-related 
difference  for  the  decoding  of  negative  facial  expressions,  but  not  positive  facial 
expressions  (Orgeta  &  Phillips,  2007).  The  ease  of  recognition  for  certain  emotional 
expressions  is  a  phenomenon  pertinent  to  this  research  area.  As  Orgeta  and  Phillips 
(2007)  showed,  older  adults  may  have  a  positivity  bias  that  allows  them  to  overcome  any 
cognitive  decrements  that  interrupt  other  emotional  decoding,  thus  decoding  positive 
facial  expressions  as  accurately  as  younger  adults.  Other  research  has  supporting  data 
showing  that  positive  expressions  (e.g.,  happiness)  are  processed  more  quickly,  supported 
by  faster  N170  latencies  (Batty  &  Taylor,  2003).  Perhaps  this  quick  processing  attributes 
to  the  robustness  of  the  happy  facial  expression  compared  to  other  expressions. 

A  previous  study  manipulated  the  factors  of  chronological  age  and  the 
participant’s  working  self-concept  to  determine  if  the  positivity  effect  could  in  fact  be 
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evoked  in  younger  adults,  and  likewise  the  negativity  effect  in  older  adults  (Lynchard  & 
Radvansky,  2012).  During  the  experiment  the  participant  would  complete  a  possible 
selves  orienting  task.  The  older  adults  completed  the  younger  possible  selves  orienting 
task,  while  the  younger  adults  completed  the  older  possible  selves  orienting  task. 
Essentially,  this  made  the  participant’s  working  self-concept  the  opposite  of  their 
chronological  age.  The  results  showed  a  reversal  of  stereotypical  age-related  emotional 
information  processing.  Younger  adults  displayed  a  positivity  effect,  which  is  thought  to 
be  a  unique  attribute  of  older  adults.  Similarly,  older  adults  displayed  a  negativity  effect, 
which  is  thought  to  be  unique  to  younger  adults  (Lynchard  &  Radvansky,  2012).  This 
study  showed  that  more  than  just  chronological  age  plays  a  role  in  the  socioemotional 
selectivity  theory.  Humans  are  subject  to  emotional  information  processing  biases  based 
on  less  concrete  variables  such  as  their  working  self-concept. 

Decoding  facial  expressions  is  a  cross-cultural  behavior  that  is  a  critical  part  of 
human  life.  There  are  six  basic  emotions  that  transcend  culture.  These  are:  anger, 
happiness,  fear,  surprise,  disgust,  and  sadness  (Ekman  &  Eriesen,  1975).  These  emotions 
can  be  represented  with  facial  expressions  (Lee,  2006;  Batty,  2003).  Because  these  facial 
expressions  are  not  confined  to  specific  cultures,  it  puts  no  restraints  on  the  ability  of 
different  people  groups  to  successfully  decode  these  facial  expressions.  It  appears  that 
increasing  age  is  a  factor  that  may  cause  differences  in  aspects  of  facial  expression 
decoding,  while  cultural  background  seems  to  be  of  no  hindrance.  The  unique  quality  that 
facial  expressions  have  in  their  prevalence  and  familiarity  in  human  culture  makes  them  a 
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good  candidate  for  an  ambient  display.  This  quality  of  facial  expressions  allows  the 
heuristic  of  matching  the  system  to  the  real  world  to  be  met. 

Limitations  of  Previous  Literature 

The  previous  literature  has  provided  a  foundation  for  knowledge  about  facial 
expressions,  but  there  are  limitations  to  these  studies.  The  Hess  (1997)  study  presented 
emotional  facial  expressions  in  a  single-task  format.  The  participants  viewed  the  image 
and  rated  it  on  the  emotionality  and  intensity  that  they  perceived.  This  methodology  does 
not  clarify  whether  facial  emotion  decoding  is  truly  resource/attention-free  as 
neuropsychological  studies  suggest.  A  dual-task  experiment  should  be  implemented  to 
properly  measure  attention  usage.  In  order  to  gain  this  data;  measures  of  response  time, 
accuracy,  and  subjective  workload  should  be  used.  The  Hess  (1997)  study  also  measured 
decoding  accuracy  for  each  facial  expression  image  through  the  presentation  of  several 
emotion  scales  at  once.  The  participant  was  presented  with  seven  emotional  labels,  which 
they  manipulated  to  show  the  intensity  of  emotion  for  the  previous  picture.  Instead  of 
presenting  seven  individual  scales,  it  seems  to  be  less  complicated  to  present  one  scale  or 
to  have  a  quick  input  device  (e.g.,  keyboard  number  keys)  after  the  image  is  viewed. 

The  Hess  (1997)  study  presented  facial  expression  intensity  in  increments  of  20  % 
intensity.  This  intensity  scale  may  not  provide  enough  precision  or  a  complete  spectrum 
of  facial  expression  decoding  data.  The  Orgeta  and  Phillips  (2007)  study  also  presented 
only  four  intensity  levels.  The  number  of  intensity  levels  may  need  to  be  increased  (i.e., 
create  smaller  increments  of  percentage  changes  between  each  stimuli)  to  capture  a  more 
accurate  representation  of  participants’  ability  to  decode  facial  expression.  Another 
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limitation  in  the  Orgeta  and  Phillips  (2007)  study  was  the  facial  images  were  presented  in 
increasing  order  as  the  participant  advanced  through  the  experiment.  This  method  may 
have  led  to  participants  forming  an  anticipation  bias  that  the  next  facial  image  was  going 
to  be  more  expressive. 

Previous  research  has  also  provided  evidence  that  age-related  effects  may  cause 
differences  in  the  ability  for  humans  to  properly  decode  facial  expressions.  It  has  been 
shown  that  older  adults  are  worse  at  identifying  negative  facial  expressions  (i.e.,  sadness, 
anger,  and  fear).  Older  adults  struggled  significantly  versus  younger  adults  in  properly 
recognizing  the  negative  emotions  at  intensity  levels  of  50  %,  75  %,  and  100  %.  It 
appears  that  older  adults  have  a  higher  recognition  threshold  for  certain  negative 
emotions  than  younger  adults.  Basically,  older  adults  do  not  pick  up  on  negative  facial 
stimuli  as  easily  as  younger  adults  and  need  more  intense  facial  expressions  to  determine 
the  appropriate  emotional  state  (Orgeta  &  Phillips,  2007).  In  order  to  determine  if 
theories  such  as  the  socioemotional  selectivity  theory  pertain  to  Chernoff  face 
recognition,  there  needs  to  be  an  independent  variable  of  age  with  levels  of  younger  and 
older  adults. 

The  variable  of  gender  of  the  facial  expression  stimuli  could  be  considered  a 
confounding  variable.  Hess  (1997)  used  two  male  and  two  female  actors  to  create  facial 
expressions  for  their  study.  Results  of  this  study  showed  that  the  gender  of  the  stimuli 
(i.e.,  actors)  did  influence  participant  rating  accuracy.  For  the  expressions  of  happy  and 
sad,  there  was  an  interaction  of  the  gender  of  the  stimuli  x  intensity  of  the  expression 
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(Hess,  1997).  Because  of  this  reported  interaction,  it  would  be  beneficial  to  use  non¬ 
gender  specific  stimuli  to  eliminate  this  confounding  variable. 

Previous  studies  have  looked  at  users’  ability  to  properly  decode  facial  expression 
type  (Ekman  &  Friesen,  1975),  intensity  (Tsurusawa,  Goto,  Mitsudome,  Nakashima,  & 
Tobimatsu,  2007;  Hess  1997),  and  the  effectiveness  of  Chernoff  faces  (Chernoff  1973; 
Tsurusawa,  Goto,  Mitsudome,  Nakashima,  &  Tobimatsu,  2007;  Morris,  Ebert,  & 
Rheingans,  2000).  The  purpose  of  the  current  study  is  to  examine  the  users’  ability  to 
accurately  decode  a  quantitative  value  from  Chernoff  facial  expressions. 

Overview  of  the  Current  Study 

In  order  to  determine  the  attention  usage  by  the  participants,  a  dual-task 
methodology  was  used.  Our  study  used  the  dual-task  paradigm  to  measure  the  attention¬ 
demanding  characteristics  of  facial  displays.  The  Hess  (1997)  study  measured 
participant’s  decoding  accuracy  with  several  scales  after  each  trial.  This  method  may 
create  confusion  for  the  participant,  and  not  accurately  record  participant  decoding  time. 
The  interface  should  allow  for  quick  and  simple  input  of  the  facial  expression  intensity 
from  the  participant.  The  current  study  used  only  one  measurement  scale  (direct  key 
entry)  after  each  trial  to  eliminate  any  confusion  for  the  participants  about  what  the  scales 
are  measuring  and  give  a  better  approximation  about  how  quickly  the  participant  can 
decode  the  facial  expression.  In  the  Orgeta  and  Phillips  (2007)  study  the  facial 
expressions  were  shown  in  increasing  order.  This  technique  was  not  replicated  in  the 
current  study.  Instead,  a  randomized  sequence  of  facial  expression  stimuli  was  used  to 
control  for  any  biases  that  could  be  formed  due  to  participant  expectations.  The  Chernoff 
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face  stimuli  were  manipulated  differently  compared  to  previous  research  (Chernoff, 

1973;  Tsurusawa,  Goto,  Mitsudome,  Nakashima,  &  Tobimatsu,  2007;  Morris,  Ebert,  & 
Rheingans,  2000).  Only  the  mouth  was  manipulated  in  order  to  gain  understanding  about 
the  affect  of  this  one  variable  on  decoding.  Finally,  the  current  study  used  a  more  precise 
facial  expression  intensity  scale  than  previous  research  (Hess,  1997;  Orgeta  &  Phillips, 
2007).  To  accomplish  this,  a  facial  expression  scale  presenting  emotions  in  increments  of 
10  %  was  used.  Our  assumption  was  that  by  making  these  modifications  the  current  study 
would  be  able  to  address  the  research  question  with  more  accuracy. 

Hypotheses  of  the  Current  Study 

The  first  hypothesis  (Hi)  was  that  there  would  be  no  age  differences  in  facial 
decoding  performance  in  the  happy  facial  expression  condition,  but  that  there  would  be 
decoding  performance  differences  in  the  sad  facial  expression  condition.  The  rationale 
behind  expecting  no  age  difference  in  the  happy  facial  expression  condition  is  based  on 
the  socioemotional  selectivity  theory  and  research  that  supports  positive  expressions  as 
more  identifiable;  referred  to  as  the  “happy  face  advantage”  (Ekman  &  Friesen,  1975; 
Orgeta  &  Phillips,  2007;  Calvo  &  Eundqvist,  2008).  The  rationale  for  the  age-related 
difference  in  the  sad  facial  expression  condition  is  based  on  older  adults’  difficulty  in 
perceiving  sad  facial  expressions  (Orgeta  &  Phillips,  2007),  and  the  negativity  effect  seen 
in  younger  adults  (Eynchard  &  Radvansky,  2012). 

The  second  hypothesis  (H2)  was  related  to  the  rationale  of  hypothesis  Hi  (i.e., 
effect  of  the  happy  face  advantage),  namely  that  even  in  the  presence  of  another  task, 
there  would  be  no  age  differences  in  happy  facial  expression  decoding  because  of  its 
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presumed  pre-attentiveness.  However,  we  assumed  that  sad  facial  expression  decoding 
would  require  attentional  capacity,  and  thus  be  affected  by  the  presence  of  a  dual-task.  If 
the  decoding  of  happy  facial  expression  is  actually  resource-free  (Lee,  2006;  Whalen, 
1998;  Morris,  1998),  then  facial  decoding  in  the  dual-task  phase  should  be  equivalent  to 
decoding  in  the  single-task  condition.  There  will  be  similar  performance  scores  for 
younger  and  older  adults  in  the  happy  condition;  regardless  of  phase  (single  or  dual).  This 
indicates  that  the  happy  facial  expressions  are  able  to  mitigate  the  dual-task  decrement 
that  would  be  expected  for  stimuli  that  demand  more  attention,  which  we  expect  to  be  the 
sad  facial  expressions.  Older  adults’  performance  with  sad  facial  expressions  is  expected 
to  be  worse  (compared  to  their  single-task  baseline),  due  to  their  low  negative  emotional 
sensitivity  (positivity  bias)  and  the  added  cognitive  load  of  the  dual-task.  We  also  expect 
younger  adults’  performance  to  decrease  due  to  the  additional  cognitive  load  of  the  dual¬ 
task  condition,  which  we  expect  will  degrade  any  benefit  of  the  negativity  bias. 
Additionally,  research  has  shown  younger  adults  to  be  more  quick  and  accurate  at 
decoding  happy  expressions  versus  sad  facial  expressions  (Hess,  1997;  Calvo  & 
Lundqvist,  2008). 

METHODS 

Participants 

Eighty-three  participants  (42  younger  adults,  41  older  adults)  were  recruited  for 
the  current  study.  The  younger  adult  age  range  was  18  -  21  (M  =  18.6,  SD  =  .89)  and  the 
older  adult  age  range  was  65  -  84,  (M  =  72.4,  SD  =  5.19).  Younger  adults  were  recruited 
from  psychology  courses  and  received  class  credit  for  participation.  Older  adults  were 
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recruited  from  a  pre-existing  database  of  volunteers  who  lived  in  the  surrounding 
communities.  Older  adults  received  $25  for  participation. 

Design 

This  study  was  a  2  (age  group:  younger,  older)  x  2  (facial  expression  condition: 
happy,  sad)  x  10  (facial  expression  intensity:  0%-90%)  x  2  (task  phase:  single,  dual) 
mixed-design.  Age  group  was  a  quasi-independent  grouping  variable.  Facial  expression 
condition  was  between-groups,  while  facial  expression  intensity  and  task  phase  were 
within-groups.  The  dependent  variables  measured  were:  the  speed  (ms)  for  the  block  task, 
the  speed  (ms)  of  response  on  the  facial  expression  task,  the  amount  of  “misses”  on  the 
facial  expression  task,  the  amount  of  blocks  cleared,  facial  expression  intensity  rating, 
and  decoding  accuracy  (i.e.,  slope  value)  of  the  correspondence  between  the  face 
presented  and  the  facial  expression  intensity  rating. 

Materials 

The  experiment  was  presented  on  19-inch  LCD  monitors  and  participants  made 
responses  using  the  keyboard.  Participants  were  seated  in  office  chairs  about  18-24 
inches  from  the  screen  in  a  laboratory  environment.  The  experiment  was  programmed 
using  Real  Basic. 

Surveys  &  Abilities 

Participants  completed  a  computerized  cognitive  abilities  battery.  These  tests 
gathered  information  on  participants’  working  memory,  perceptual  speed,  and 
vocabulary.  Participants  also  completed  a  computerized  version  of  the  NASA-TLX 
survey  to  measure  subjective  workload. 
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Tasks 


The  block  task  was  a  game  similar  to  the  game  Tetris  (Appendix  A).  The  block 
task  consisted  of  moving  multi-colored  blocks.  The  main  objective  of  the  block  task  was 
to  “clear”  block  rows  or  columns  by  manipulating  the  blocks  using  the  arrow  keys  and 
space  bar.  To  successfully  “clear”  a  block  row  or  column,  the  participant  was  required  to 
align  three  blocks  of  the  same  color.  This  task  was  used  in  the  dual-task  as  the  primary 
task  due  to  its  supposed  high  attentional  demand. 

The  purpose  of  the  facial  expression  decoding  task  was  to  identify  the  level  of 
emotion  presented  by  a  computer- generated  facial  expression  (Appendix  B).  The  facial 
expression  stimuli  were  rendered  using  the  statistical  program  R.  This  allowed  the 
experimenter  to  have  control  over  the  faces  and  manipulate  their  facial  expression 
intensity  as  desired.  The  facial  expression  stimuli  were  line  drawings  composed  of  black 
lines  on  a  white  background.  This  eliminated  any  confounding  variables  due  to  the 
gender,  ethnicity,  or  age  of  the  stimuli.  There  were  19  images:  9  happy  stimuli  (ranging 
from  10%  expressive  -  90%  expressive),  9  sad  stimuli  (ranging  from  10%  expressive  - 
90  %  expressive),  and  one  neutral  stimulus  (0  %  expressive),  see  Appendix  C.  The  range 
of  expressiveness  was  chosen  from  0%-90%  in  an  effort  to  make  a  match  between  the 
key  number  pad  and  the  expression  levels.  The  images  were  170  pixels  by  250  pixels. 
Procedure 

Participants  were  randomly  assigned  to  experimental  conditions  (happy  or  sad) 
prior  to  the  experiment.  The  participants  were  given  an  informational  letter  before  the 
experiment  began.  The  experiment  consisted  of  three  phases.  The  participants  completed 
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two  subsequent  single-tasks  (i.e.,  the  block  task  and  facial  expression  decoding  task)  to 
record  baseline  data  on  their  abilities,  and  to  become  familiar  with  each  task.  To  examine 
the  attentional  demands  of  decoding  Chernoff  faces,  participants  then  engaged  in  the 
dual-task  phase.  Participants  were  instructed  to  focus  on  the  block  task  (i.e.,  primary  task) 
and  consider  it  to  be  the  most  important  task.  This  spatial-manipulation  task  was  chosen 
due  to  the  expectation  of  being  cognitively  taxing  for  the  participants.  Participants  were 
told  to  try  to  complete  the  facial  expression  decoding  task  (i.e.,  secondary  task) 
effectively,  but  not  to  sacrifice  their  primary  task  performance  during  the  dual-task  phase. 

In  phase  1 ,  participants  performed  the  block  task  in  a  single -task  environment. 
The  participant  had  to  reach  a  pre-set  score  (based  on  number  of  blocks  cleared)  to 
complete  the  task.  Once  the  participant  completed  this  phase,  the  program  proceeded  to 
phase  2.  In  phase  2  of  the  experiment,  participants  were  asked  to  respond  to  Chernoff 
facial  expressions  that  were  flashed  on  the  computer  screen.  The  participants  were  in  one 
of  two  facial  expression  conditions  (i.e.,  happy  or  sad)  and  only  saw  faces  related  to  their 
facial  expression  condition. 

Once  phase  2  began,  the  Chernoff  facial  expression  appeared  in  a  window  on  the 
computer  screen.  The  facial  expressions  were  shown  in  a  randomized  order  in  regard  to 
their  intensity  level.  During  the  time  interval  that  the  facial  expression  was  present, 
participants  attempted  to  respond  to  the  facial  expression  using  the  number  keys.  If  the 
participant  did  not  hit  a  number  key  before  this  time  elapsed  then  a  “miss”  was  recorded. 
Regardless  of  whether  the  participants  had  responded  or  missed  making  a  response,  after 
three  to  five  seconds  (randomized  facial  expression  appearance  time)  the  screen  went 
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back  to  being  blank  until  the  next  trial.  There  were  60  trials  in  each  condition  (i.e.,  6 
exposures  to  each  of  the  stimuli  for  a  specific  condition).  After  the  participants  were 
exposed  to  all  60  stimuli  the  program  proceeded  to  phase  3. 

In  phase  3,  participants  were  exposed  to  both  phases  1  and  2  simultaneously  (see 
Appendix  D).  This  created  a  dual-task  situation.  The  task  goals  defined  for  the  two 
single-tasks  remained  the  same  for  the  dual-task  phase.  However,  participants  were  told 
to  treat  the  block  task  as  the  primary  task.  This  phase  continued  until  all  facial  expression 
stimuli  were  presented  to  the  participants.  After  the  participants  completed  the 
experiment,  the  computer  loaded  the  computerized  NASA-TLX  survey.  Subsequently, 
the  battery  of  computerized  cognitive  abilities  tests  was  loaded  for  the  participants  to 
complete.  Once  the  participants  completed  the  cognitive  abilities  battery  they  were 
finished  with  the  study  and  permitted  to  leave. 

RESULTS 

Participants’  data  were  removed  based  on  two  criteria:  1)  if  they  missed  all  the 
faces  presented  in  phase  3  (i.e.,  indicating  little  attention  paid  to  the  secondary  task),  or  2) 
if  they  were  2  standard  deviations  below  the  group  average  for  clearing  blocks  in  phase  3 
(which  indicated  little  attention  being  paid  to  the  primary  task).  Participants’  who  had 
marginally  low  performance  (on  either  of  the  aforementioned  criteria);  subsequently  had 
their  cognitive  abilities  test  results  examined.  If  the  participant  had  a  cognitive  ability  test 
score  2  standard  deviations  below  the  group  average  (on  any  of  the  three  ability  tests), 
then  their  data  were  removed  from  the  final  analysis.  This  criteria  resulted  in  the  removal 
of  nine  participants:  six  participants  due  to  missing  all  the  faces  presented  in  phase  3,  one 
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participant  who  scored  2  standard  deviations  below  the  group  average  for  clearing  blocks, 
one  participant  who  missed  most  of  the  faces  presented  in  phase  3  (55  out  of  60)  and 
scored  2  standard  deviations  below  the  group  average  on  two  cognitive  ability  tests,  and 
one  participant  was  removed  because  they  participated  in  the  pilot  testing  for  the  current 
study. 

The  following  results  section  is  organized  by  task  phase  (i.e.,  single  or  dual).  To 
remind  the  reader,  phase  2  was  the  single-task  for  facial  expression  decoding  and  phase  3 
was  the  dual-task  condition.  The  results  of  the  single-task  facial  expression  decoding 
condition  (phase  2)  inform  hypothesis  Hi,  while  the  dual-task  facial  expression  decoding 
condition  (phase  3)  results  are  directly  relevant  to  hypothesis  H2.  In  the  single-task  facial 
expression  decoding  condition  (phase  2),  the  following  dependent  variables  were 
analyzed:  intensity  key  pressed,  facial  expression  decoding  accuracy,  facial  expression 
response  time  (ms),  and  the  amount  of  face  misses  for  the  facial  expression  task.  In  the 
dual-task  portion  (phase  3),  the  following  dependent  variables  were  analyzed:  intensity 
key  pressed,  facial  expression  decoding  accuracy,  facial  expression  response  time  (ms), 
the  amount  of  face  misses  for  the  facial  expression  task,  and  computed  workload  from  the 
NASA-TLX  survey.  An  alpha  level  of  .05  was  used  for  all  of  the  following  statistical 
tests.  Tests  for  the  assumption  of  normality  (i.e.,  histogram,  Q-Q  plot)  and 
homoscedasticity  were  conducted  and  showed  the  data  met  the  assumption  for  normality 
and  homoscedasticity.  For  all  mixed  measures  ANOVAs,  the  number  of  levels  of  the 
repeated  measures  IV  (i.e.,  single  task  phase,  dual  task  phase)  was  less  than  three,  so 
sphericity  was  assumed. 
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Phase  2  (Single-task,  Facial  Expression  Decoding  Only) 

Intensity  Key  Pressed 

As  participants  were  presented  faces  during  phase  2,  they  were  asked  to  give 
intensity  ratings  about  each  face.  In  order  to  give  these  intensity  ratings,  participants’ 
used  the  keyboard  number  keys  as  the  input  device.  The  intensity  key  pressed  ratings  for 
a  participant  were  averaged  across  all  trials  for  phase  2.  This  yielded  a  mean  intensity  key 
pressed  value  that  could  be  analyzed  as  a  function  of  facial  expression  condition,  age 
group,  and  face  presented.  The  intensity  key  pressed  ratings  were  also  necessary  for  the 
calculation  of  decoding  accuracy,  which  will  now  be  explained. 

Decoding  Accuracy 

In  the  facial  expression  decoding  task,  participants  were  asked  to  view  facial 
expressions  that  were  flashed  on  the  computer  screen  (heretofore  called  “face  presented”) 
and  to  respond  with  an  intensity  rating  (“intensity  key  pressed”).  The  facial  expressions 
presented  ranged  from  0  (neutral)  to  9  (very  expressive).  Decoding  accuracy  was 
operationalized  as  the  correspondence  between  the  face  presented  and  participants’ 
intensity  key  pressed.  The  regression  slope  of  participants’  correspondence  was  used  to 
quantify  decoding  accuracy. 

A  hierarchical  regression  analysis  was  conducted  to  predict  intensity  key  pressed 
as  a  function  of  age  group,  facial  expression  condition,  and  face  presented.  The  predictor 
variables  of  age  group  and  facial  expression  condition  were  dummy-coded.  The  predictor 
variables  were  entered  in  three  steps,  which  resulted  in  three  different  models.  The  first 
step  contained  the  following  predictor  variables:  face  presented,  facial  expression 
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condition,  and  age  group.  These  predictor  variables  represented  all  of  the  main  effects 
tested  (model  1).  The  second  step  contained  the  predictor  variables  from  model  1  with  the 
addition  of  the  following  two-way  interactions:  age  group  x  facial  expression  condition, 
face  presented  x  age  group,  and  face  presented  x  facial  expression  condition  (model  2). 
The  third  step  contained  all  of  the  predictor  variables  from  model  1  and  model  2  with  the 
addition  of  the  following  three-way  interaction:  face  presented  x  age  group  x  facial 
expression  condition  (model  3). 

The  three  models  were  tested  for  their  ability  to  significantly  predict  participants’ 
intensity  key  pressed.  Model  1  accounted  for  44.4  %  of  the  variance  of  intensity  key 
pressed,  {R^  =  .444,  F(3,  826)  =  220.1 1,  p  <  .001).  Model  2  accounted  for  51  %  of  the 
variance  of  intensity  key  pressed,  {P^  =  .510,  F{6,  823)  =  142.62,  p  <  .001).  Model  3 
accounted  for  51.1  %  of  the  variance  of  intensity  key  pressed,  {R  =.511,  F{1 ,  822)  = 

2 

122.66,  p  <  .001).  The  addition  of  the  two-way  interactions  in  model  2  resulted  in  a 
change  value  of  .065,  or  6.5  %,  while  the  addition  of  the  three-way  interaction  in  model  3 
resulted  in  a  change  value  of  .001,  or  0.1  %.  The  addition  of  the  three-way  interaction 
(via  model  3)  did  not  add  a  significant  amount  of  predictive  power  to  the  model. 

The  non-significance  of  the  hypothesized  three-way  interaction  of  face  presented 
X  age  group  x  facial  expression  condition  ((?  =  -.1 1,  t(822)  =  -1.39,  p  =  .165),  caused 
slope  comparisons  to  be  confined  to  the  two-way  interactions  in  model  2.  The  two-way 
interaction  terms  in  the  hierarchical  regression  were  a  method  to  test  for  a  significant 
difference  between  the  regression  line  slopes.  Therefore,  when  a  two-way  interaction  was 
found  to  be  significant,  it  was  showing  the  two  regression  slopes  to  be  significantly 
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different.  First,  main  effects  and  interactions  for  intensity  key  pressed  will  be  addressed, 
followed  by  interactions  related  to  decoding  accuracy. 

Main  Effects  and  Interactions  for  Intensity  Key  Pressed 
There  was  a  significant  main  effect  of  face  presented  on  participants’  intensity 
key  pressed,  {b  =  .53,  t(826)  =  25.27,  p  <  .001),  which  meant  participants  were  generally 
able  to  discriminate  the  various  levels  of  face  presented.  As  the  actual  face  presented 
stimuli  increased  from  0  %  to  90  %,  there  was  a  .53  unit  increase  for  intensity  key 
pressed  by  the  participants.  There  was  a  significant  main  effect  of  facial  expression 
condition,  {b  =  .57,  t(826)  =  4.67,  p  <  .001).  This  main  effect  revealed  a  significant 
increase  in  mean  intensity  key  pressed  between  the  sad  facial  expression  condition  (M  = 
4.49,  SD  =  2.15)  and  the  happy  facial  expression  condition  (M=  5.06,  SD  =  2.47).  There 
was  no  main  effect  of  age  group,  (&  =  .01,  t(826)  =  .09,  p  =  .928). 

The  two-way  interaction  of  age  group  x  facial  expression  condition  was 
significant,  {b  =  -.64,  t(823)  =  -2.82,  p  <  .01).  Due  to  the  dichotomous  nature  of  the 
predictor  variables  (happy,  sad;  younger,  older),  the  lines  only  contain  two  data  points 
(i.e.,  mean  values  of  intensity  key  pressed).  The  interaction  can  be  conceptualized  as  the 
difference  between  the  differences  in  mean  values  of  intensity  key  pressed  for  each  age 
group.  The  difference  between  the  means  (i.e.,  slope),  for  younger  adults  was  .88,  which 
is  significantly  different  than  the  difference  between  the  means,  .25,  for  older  adults. 

Y2—Y1 

Slopes  were  found  using  the  following  formula:  b  =  ,  where  the  mean  values  were 

used  for  Y  and  facial  expression  condition  coding  (0  =  Sad,  1  =  Happy)  was  used  for  X. 
As  Figure  1  illustrates,  the  two-way  interaction  was  a  result  of  the  significantly  greater 
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increase  in  mean  intensity  key  pressed  in  the  younger  adult  group  as  a  function  of  facial 
expression  condition  compared  to  older  adults. 


Facial  Expression  Condition 

Figure  1.  Mean  intensity  key  pressed  by  facial  expression  condition  for  younger  and  older  adults. 

Interactions  for  Decoding  Accuracy 

The  two-way  interaction  of  face  presented  x  age  group  was  significant,  (b  =  -.18, 
t(823)  =  -4.46,  p  <  .001).  This  indicated  that  in  general,  younger  adults  were  significantly 
better  than  older  adults  at  accurately  decoding  the  faces  presented.  Participants’  facial 
expression  decoding  values  were  compared  between  the  younger  age  group  and  the  older 
age  group,  resulting  in  an  observed  significant  decrease  in  slope  (i.e.,  a  younger  adult 
slope  of  b  =  .63  versus  an  older  adult  slope  of  b  =  .43),  illustrated  by  Figure  2. 
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Figure  2.  Mean  intensity  key  pressed  by  face  presented  for  younger  and  older  adults. 

The  two-way  interaction  of  face  presented  x  facial  expression  condition  was 
significant,  (b  =  .35,  t(823)  =  8.78,  p  <  .001).  This  indicated  that  all  participants  were 
generally  more  accurate  at  decoding  the  happy  facial  expression  condition  than  the  sad 
facial  expression  condition.  This  two-way  interaction  is  illustrated  by  Figure  3. 
Participants’  (collapsing  across  age  group)  facial  expression  decoding  values  were 
compared  between  the  sad  facial  expression  condition  and  happy  facial  expression 
condition,  yielding  a  significant  difference  in  slopes  (i.e.,  a  sad  slope  of  b  =  .35  versus  a 
happy  slope  of  &  =  .71). 
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Figure  3.  Mean  intensity  key  pressed  by  face  presented  for  sad  and  happy  facial  expression  conditions. 

The  three-way  interaction  for  face  presented  x  age  group  x  facial  expression 
condition  was  not  significant  {b  =  -.11,  t(822)  =  -1.39,  p  =  .17).  This  means  that  facial 
expression  decoding  accuracy  did  not  differ  as  a  function  of  age  group  and  facial 
expression  condition.  This  does  not  support  hypothesis  Hi,  which  predicted  no  age 
differences  in  decoding  accuracy  in  the  happy  facial  expression  condition,  while 
predicting  an  age  difference  in  the  sad  facial  expression  condition. 

Intensity  Key  Pressed  Response  Time 

The  speed  at  which  participants  made  responses  could  be  interpreted  as  the  level 
of  attentional  demand  required  of  the  stimuli.  The  purpose  of  measuring  intensity  key 
pressed  response  time  was  to  examine  whether  attentional  demand  changed  as  a  function 
of  facial  expression  condition,  age  group,  or  an  interaction  of  facial  expression  condition 
X  age  group.  The  response  time  for  a  participant  was  operationalized  as  the  time  in 
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milliseconds  (ms)  it  took  a  participant  to  depress  a  number  key  when  presented  with  a 
facial  expression.  The  facial  expression  would  appear  randomly  throughout  phase  2 
(every  3-5  seconds)  to  avoid  a  predictable  appearance  interval.  However,  the  face 
appeared  or  was  shown  for  the  same  amount  of  time  for  every  trial  (2  seconds  for 
younger  adults,  2.5  seconds  for  older  adults).  Response  time  data  was  discussed  in  terms 
of  seconds  for  ease  of  understanding. 

A  2  (age  group)  x  2  (facial  expression  condition)  ANOVA  was  conducted  to 
analyze  participants’  response  time  data.  A  significant  main  effect  was  found  for  age 
group  (F(l,  81)  =  317.80,  p  <  .001).  Younger  adults’  response  time  (M  =  1.27  s,  57)  =  .11 
s)  was  significantly  faster  than  older  adults’  response  time  (M  =  1.9  s,  SD  =  .20  s).  There 
was  no  main  effect  for  facial  expression  condition  (F(l,  81)  =  .342,  p  =  .56),  and  no 
significant  interaction  for  age  group  x  facial  expression  condition  (F(l,  81)  =  .03,  p  = 
.86).  Regardless  of  facial  expression  condition,  younger  adults  had  significantly  faster 
response  times  than  older  adults;  illustrated  by  Figure  4. 
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Figure  4.  Mean  response  time  (ms)  by  age  group  for  sad  and  happy  facial  expression  conditions. 

Face  Misses 

The  extent  that  participants  “missed”  identifying  faces  in  the  allotted  time  could 
be  used  to  understand  the  attention  demanding  characteristics  of  the  faces.  We  anticipated 
pre-attentive  faces  to  be  less  “missed”  compared  to  faces  that  required  more  attention. 
Face  misses  were  operationalized  as  situations  where  the  participant  did  not  respond,  or 
failed  to  press  the  number  key  (i.e.,  intensity  key  pressed)  within  the  allotted  time 
interval.  When  participants  “missed”  a  facial  expression  it  was  recorded,  and  misses  were 
summed  and  averaged  for  participants’  experimental  session. 

A  2  (age  group)  x  2  (facial  expression  condition)  ANOVA  was  conducted  to 
analyze  participants’  amount  of  misses.  A  significant  main  effect  was  found  for  facial 
expression  condition  (F(l,  81)  =  5.9,  p  =  .02).  Participants  in  the  sad  facial  expression 
condition  had  significantly  more  misses  {M  =  8.53,  SD  =  5.48)  than  participants  in  the 
happy  facial  expression  condition  (M  =  6.05,  SD  =  3.6).  There  was  no  main  effect  of  age 
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group  (F(l,  81)  =  2.68,  p  =  .1 1),  and  no  interaction  for  age  group  x  facial  expression 
condition  (F(l,  81)  =  3.66,  p  =  .06).  Figure  5  highlights  the  main  effect  of  facial 
expression  condition  and  the  marginally  significant  interaction  between  age  group  x 
facial  expression  condition. 


Facial 

Expression 

Condition 

■  Sad 

■  Happy 


Age  Group 


Error  bars;  +/-  1  SE 


Figure  5.  Mean  number  of  face  misses  by  age  group  for  sad  and  happy  facial  expression  conditions. 


In  sum,  the  results  of  the  analysis  of  task  phase  2  show  that  the  variables  of  face 
presented,  facial  expression  condition,  and  age  group  had  a  significant  effect  on 
participants’  performance.  The  significant  main  effect  of  face  presented  on  participants’ 
intensity  key  pressed  showed  a  positive  linear  trend  for  intensity  key  pressed  as  the 
variable  of  face  presented  increased.  The  significant  main  effect  of  facial  expression 
condition  on  intensity  key  pressed  revealed  a  significant  increase  in  mean  intensity  key 
pressed  when  comparing  between  the  sad  facial  expression  condition  and  the  happy  facial 
expression  condition.  The  significant  main  effect  of  age  group  on  response  time  showed 
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younger  adults’  response  time  was  significantly  faster  than  older  adults’  response  time. 
The  significant  main  effect  of  facial  expression  condition  on  face  misses  showed 
participants  in  the  sad  facial  expression  condition  had  significantly  more  misses  than 
participants  in  the  happy  facial  expression  condition.  The  significant  two-way  interaction 
of  age  group  x  facial  expression  condition  showed  a  significantly  higher  intensity  key 
pressed  for  younger  adults  compared  to  older  adults,  when  comparing  between  the  sad 
and  happy  facial  expression  condition.  The  significant  two-way  interaction  of  face 
presented  x  facial  expression  condition  showed  participants  in  the  happy  facial 
expression  condition  had  significantly  higher  decoding  accuracy  than  those  in  the  sad 
facial  expression  condition.  However,  the  lack  of  a  three-way  interaction  suggested  that 
the  happy  face  advantage  for  decoding  was  not  significant  for  older  adults.  The 
significant  two-way  interaction  of  face  presented  x  age  group  showed  younger  adults  had 
a  significantly  higher  decoding  accuracy  than  older  adults. 

Examination  of  the  aforementioned  data  was  from  task  phase  2  (single-task 
phase)  where  presumably,  all  attention  was  devoted  to  the  facial  expression  decoding 
task.  To  examine  the  attentional  demands  of  facial  decoding,  performance  in  the  facial 
expression  decoding  task  was  examined  in  the  context  of  a  dual-task  environment  (phase 

3). 

Task  Phase  3  (Dual-task,  Block  Task  and  Facial  Expression  Decoding) 

In  task  phase  3,  participants  were  given  a  primary  task  (block  game)  and  a 
secondary  task  (facial  expression  decoding).  This  dual-task  paradigm  allowed  participant 
performance  data  from  phase  2  to  be  compared  to  phase  3  (i.e.,  attention  divided 


33 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


situation).  The  purpose  of  the  following  analyses  was  to  determine  the  extent  to  which 
facial  expression  decoding  was  disrupted  (i.e.,  dual-task  cost)  by  the  block  task. 

In  phase  3,  intensity  key  pressed  and  decoding  accuracy  were  operationalized  as 
described  in  phase  2.  However,  the  new  independent  variable  of  task  phase  provided  a 
method  to  compare  performance  variables  as  a  function  of  single  or  dual-task. 

A  hierarchical  regression  analysis  was  conducted  to  predict  intensity  key  pressed 
as  a  function  of  age  group,  facial  expression  condition,  face  presented,  and  task  phase. 
The  predictor  variables  of  age  group,  facial  expression  condition,  and  task  phase  were 
dummy-coded.  The  predictor  variables  were  entered  in  four  steps,  which  resulted  in  four 
different  models.  The  first  step  contained  the  following  predictor  variables:  face 
presented,  facial  expression  condition,  age  group,  and  task  phase.  These  predictor 
variables  represented  all  of  the  main  effects  tested  (model  1).  The  second  step  contained 
the  predictor  variables  from  model  1  with  the  addition  of  the  following  two-way 
interactions:  age  group  x  facial  expression  condition,  face  presented  x  age  group,  face 
presented  x  facial  expression  condition,  face  presented  x  task  phase,  task  phase  x  age 
group,  and  task  phase  x  facial  expression  condition  (model  2).  The  third  step  contained 
all  of  the  predictor  variables  from  model  1  and  model  2  with  the  addition  of  the  following 
three-way  interactions:  face  presented  x  age  group  x  facial  expression  condition,  task 
phase  X  age  group  x  facial  expression  condition,  face  presented  x  task  phase  x  age  group, 
and  face  presented  x  task  phase  x  facial  expression  condition  (model  3).  The  fourth  step 
contained  all  of  the  predictor  variables  from  model  1,  model  2,  and  model  3,  with  the 
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addition  of  the  following  four-way  interaction:  face  presented  x  task  phase  x  facial 
expression  condition  x  age  group  (model  4). 

The  models  were  tested  for  their  ability  to  significantly  predict  participants’ 
intensity  key  pressed.  Model  1  accounted  for  43.6  %  of  the  variance  of  intensity  key 
pressed,  =  .436,  F{A,  1552)  =  299.92,  p  <  .001).  Model  2  accounted  for  49.3  %  of  the 
variance  of  intensity  key  pressed,  {P^  =  .493,  /^(lO,  1546)  =  150.34,  p  <  .001).  Model  3 
accounted  for  49.6  %  of  the  variance  of  intensity  key  pressed,  {R  =  .496,  F(14,  1542)  = 
108.33,  p  <  .001).  Model  4  accounted  for  49.6  %  of  the  variance  of  intensity  key  pressed, 
{P^  =  .496,  /^(15,  1541)  =  101.21,/?  <  .001).  The  addition  of  the  two-way  interactions  in 
model  2  resulted  in  an  R  change  value  of  .057,  or  5.7  %,  while  the  addition  of  the  three- 
way  interaction  in  model  3  resulted  in  a  change  value  of  .003,  or  0.3  %.  The  addition 
of  the  four-way  interaction  resulted  in  no  significant  R  change  compared  to  model  3. 

As  expected,  (due  to  the  low  R  change  value  from  model  2  to  model  3),  the 
hierarchical  regression  showed  non-significant  values  for  all  of  the  task  phase  related 
three-way  interactions:  task  phase  x  age  group  x  facial  expression  condition  {b  =  .08, 
t(1542)  =  .21,  p  =  .83),  face  presented  x  task  phase  x  age  group  {b  =  -.02,  t(1542)  =  -.35, 
p  =  .72),  and  face  presented  x  task  phase  x  facial  expression  condition  {b  =  -.05,  t(1542)  = 
-.85,  p  =  .40).  This  meant  no  two-way  interactions  significantly  changed  across  the 
predictor  variable  of  task  phase  (e.g.,  face  presented  x  facial  expression  condition  did  not 
change  due  to  task  phase).  It  was  determined  that  model  4  did  not  yield  a  significant  four¬ 
way  interaction,  {b  =  -.14,  t(1541)  =  -1.1,  /?  =  .269).  Due  to  the  non-significant  results  of 
the  three-way  and  four-way  interaction  terms,  the  following  analyses  concentrate  on 
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model  1  and  model  2.  Slope  comparisons  will  be  confined  to  only  two-way  interactions 
related  to  model  2.  The  analyses  of  model  1  and  model  2  give  a  simplified  overview  (i.e., 
less  complex  interactions)  of  the  effect  of  task  phase  on  participant  performance. 

Main  Effects  and  Interactions  for  Intensity  Key  Pressed 
There  was  no  main  effect  of  task  phase  on  participants’  intensity  key  pressed,  {b  = 
.09,  t(1552)  =  .927,  p  =  .354).  As  participants’  moved  from  single  to  dual-task  there  was 
no  significant  difference  for  intensity  key  pressed  values.  The  non-significant  main  effect 
of  task  phase  can  be  thought  of  as  a  manipulation  check,  indicating  that  participants  did 
not  give  the  facial  expression  stimuli  significantly  different  mean  intensity  ratings  in  the 
single-task  phase  versus  the  dual-task  phase. 

There  was  no  significant  two-way  interaction  for  facial  expression  condition  x 
task  phase,  {b  =  .18,  t(1546)  =  .99,  p  =  .32).  Facial  expression  condition  did  not  have  a 
significant  effect  on  the  difference  between  the  differences  of  means  (i.e.,  slope)  for 
intensity  key  pressed,  when  comparing  across  task  phase. 

A  significant  two-way  interaction  was  found  for  age  group  x  task  phase,  {b  =  .39, 
t(1546)  =  2.17,  p  =  .03),  illustrated  by  Figure  6.  Task  phase  had  a  significant  effect  on  the 
difference  between  the  differences  of  means  (i.e.,  slope)  for  intensity  key  pressed,  when 

Y2—Y1 

comparing  across  age  group.  Slopes  were  found  using  the  following  formula:  b  =  , 

where  the  mean  intensity  key  pressed  values  were  used  for  Y  and  age  group  coding  (0  = 
Single,  1  =  Dual)  was  used  for  X.  The  slope  for  younger  adults  {b  =  -.05)  was 
significantly  different  from  the  slope  for  older  adults  {b  =  .27).  The  change  in  mean 
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intensity  key  pressed,  as  a  function  of  task  phase  for  older  adults,  was  significantly 
greater  than  younger  adults. 


Task  Phase 

Figure  6.  Mean  intensity  key  pressed  by  task  phase  for  younger  and  older  adults. 

Interactions  for  Decoding  Accuracy 

There  was  no  significant  two-way  interaction  of  face  presented  x  task  phase,  (b  = 
.04,  t(1546)  =  1.17,  p  =  .24).  Participants’  decoding  accuracy  (when  collapsing  across  age 
group  and  facial  expression  condition)  was  not  significantly  affected  by  the  task  phase  of 
the  experiment.  The  slope  values  for  each  task  phase  did  not  significantly  differ. 

No  significant  three-way  interactions  were  observed  as  a  function  of  task  phase. 
The  three-way  interaction  of  task  phase  x  age  group  x  facial  expression  condition  was  not 
significant  {b  =  .08,  t(1542)  =  .21,  p  =  .83),  the  three-way  interaction  of  task  phase  x  face 
presented  x  age  group  was  not  significant  (b  =  -.02,  t(1542)  =  -.35,  p  =  .72),  and  the 
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three-way  interaction  of  task  phase  x  face  presented  x  facial  expression  condition  was  not 
significant  {b  =  -.05,  t(1542)  =  -.85,  p  =  .40).  The  non-significance  of  these  three-way 
interactions  indicated  that  no  two-way  interactions  significantly  differed  across  task 
phase.  The  significant  two-way  interaction  of  face  presented  x  age  group  shown  in  the 
single-task  phase,  remained  significant  {b  =  -.20,  t(720)  =  -4.14,  p  <  .001)  in  the  dual-task 
phase,  illustrated  by  Figure  7.  This  meant  the  significant  interaction  between  face 
presented  x  age  group  (i.e.,  younger  adults  had  significantly  higher  decoding  accuracy 
than  older  adults)  in  the  single-task,  was  replicated  in  the  dual-task.  The  two-way 
interaction  of  face  presented  x  facial  expression  condition  shown  in  the  single-task  phase, 
remained  significant  {b  =  .30,  t(720)  =  6.13,  p  <  .001)  in  the  dual-task  phase,  illustrated 
by  Figure  8.  This  meant  the  significant  interaction  between  face  presented  x  facial 
expression  condition  (i.e.,  happy  condition  was  significantly  higher  for  decoding 
accuracy  than  sad  condition)  in  the  single-task  was  replicated  in  the  dual-task. 

Essentially,  this  showed  there  was  no  dual-task  cost  for  these  two-way  interactions. 


38 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


Figure  7.  Mean  intensity  key  pressed  by  face  presented  for  younger  and  older  adults  (dual-task). 


s 

•o 


S  6.00- 


a 

>- 


00-  o 


- 1 - 1 - 

4  6 

Face  Presented 


Figure  8.  Mean  intensity  key  pressed  by  face  presented  for  sad  and  happy  facial  expression  condition 


(dual-task). 
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The  four-way  interaction  of  face  presented  x  task  phase  x  facial  expression 
condition  x  age  group  was  not  significant,  {b  =  -.14,  t(1541)  =  -1.11,  p  =  .27).  This 
finding  showed  that  no  three-way  interactions  significantly  differed  across  task  phase. 
This  showed  a  lack  of  dual-task  cost  for  the  interaction  of  face  presented  x  facial 
expression  condition  x  age  group.  In  the  single-task  happy  facial  expression  condition, 
the  significant  two-way  interaction  for  face  presented  x  age  group  {b  =  -.23,  t(426)  = 
-5.03,  p  <  .001)  remained  significant  in  the  dual-task  happy  facial  expression  condition, 
{b  =  -.32,  t(384)  =  -5.58,  p  <  .001),  illustrated  by  Figures  9  and  10.  This  meant  the 
significant  interaction  between  face  presented  x  age  group  (i.e.,  younger  adults  had 
significantly  higher  decoding  accuracy  than  older  adults)  in  the  single-task  happy  facial 
expression  condition,  was  replicated  in  the  dual-task  happy  facial  expression  condition. 


Single  Task  Phase.  Happy  Facial  Expression  Condition 


Figure  9.  Mean  intensity  key  pressed  by  face  presented  for  younger  and  older  adults  (single-task,  happy 
facial  expression  condition). 
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Dual  Task  Phase.  Happy  Facial  Expression  Condition 


Figure  10.  Mean  intensity  key  pressed  by  face  presented  for  younger  and  older  adults  (dual-task,  happy 
facial  expression  condition). 

In  the  single-task  sad  facial  expression  condition,  the  non-significant  two-way  interaction 
for  face  presented  x  age  group  {b  =  -.12,  t(396)  =  -1.82,  p  =  .07)  remained  non-significant 
in  the  dual-task  happy  facial  expression  condition  {b  =  -.07,  t(335)  =  -  .86,  p  =  .39), 
illustrated  by  Figures  1 1  and  12.  This  meant  the  non-significant  interaction  between  face 
presented  x  age  group  (i.e.,  younger  adults  had  similar  decoding  accuracy  as  older  adults) 
in  the  single-task  sad  facial  expression  condition,  was  replicated  in  the  dual-task  sad 
facial  expression  condition. 
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Single  Task  Phase.  Sad  Facial  Expression  Condition 


Figure  11.  Mean  intensity  key  pressed  by  face  presented  for  younger  and  older  adults  (single-task,  sad 
facial  expression  condition). 


Dual  Task  Phase.  Sad  Facial  Expression  Condition 


Figure  12.  Mean  intensity  key  pressed  by  face  presented  for  younger  and  older  adults  (dual-task,  sad  facial 


expression  condition). 
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Intensity  Key  Pressed  Response  Time 

A  mixed  measures  ANOVA  was  conducted  on  the  response  time  data  for  facial 
expression  decoding.  There  was  a  significant  main  effect  of  task  phase  on  response  time 
(F(l,  79)  =  34.34,  p  <  .001),  illustrated  by  Figure  13.  Response  time  for  task  phase  2  (M 
=1.59  s,  SD  =  .36  s)  was  significantly  faster  than  reaction  time  for  task  phase  3  (M  =  1.72 
s,  SD  =  .38  s).  There  were  no  significant  interactions  for  task  phase  x  age  group,  task 
phase  X  facial  expression  condition,  or  task  phase  x  age  group  x  facial  expression 
condition.  There  was  a  significant  main  effect  for  age  group  on  response  time  (F(l,  79)  = 
345.50,  p  <  .001).  Response  time  for  younger  adults  (M  =  1.34  s,  SD  =  .24  s)  was 
significantly  faster  than  for  older  adults  (M  =  1.98  s,  SD  =  .24  s),  illustrated  by  Figure  14. 
The  main  effect  for  facial  expression  condition  was  not  significant,  nor  was  the 
interaction  of  age  group  x  facial  expression  condition. 


Task  Phase 


Error  Bars:  +/-  1  SE 

Figure  13.  Mean  response  time  (ms)  by  task  phase. 
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Figure  14.  Mean  response  time  (ms)  by  age  group. 


Face  Misses 


A  mixed  measures  ANOVA  was  conducted  on  the  amount  of  face  misses  between 


the  single  and  dual-task  phase.  A  significant  main  effect  was  found  for  task  phase  (F(  1 , 


79)  =  276.68,  p  <  .001),  such  that  participants  had  fewer  misses  in  the  single-task  (M  = 


7.24,  SD  =  4.74)  compared  to  the  dual-task  (M  =  33.55,  SD  =  14.10),  illustrated  by  Figure 


15.  There  were  no  significant  interactions  for  task  phase  x  facial  expression  condition. 


task  phase  x  age  group,  or  task  phase  x  facial  expression  condition  x  age  group.  There 


was  no  significant  main  effect  for  facial  expression  condition  or  age  group.  There  was 


also  no  significant  interaction  for  facial  expression  condition  x  age  group. 
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Task  Phase 

Error  Bars.  +/-  1  SE 

Figure  15.  Mean  number  of  face  misses  by  task  phase. 

Blocks  Cleared 

A  2  (age  group)  x  2  (facial  expression  condition)  ANOVA  was  conducted  on  the 
number  of  blocks  cleared  in  the  dual-task  phase.  There  was  a  significant  main  effect  for 
age  group  (F(l,79)  =  160.29,  p  <  .001),  such  that  younger  adults  cleared  significantly 
more  blocks  (M  =  46.95,  SD  =  10.37)  than  older  adults  (M  =  20.07,  SD  =  8.61), 
illustrated  by  Figure  16.  There  was  no  significant  main  effect  of  facial  expression 
condition  or  significant  interaction  of  age  group  x  facial  expression  condition. 
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Figure  16.  Mean  blocks  cleared  by  age  group. 

NASA-TLX  Survey 

The  NASA-TLX  subjective  workload  survey  was  given  to  all  participants  in  order 
to  assess  the  amount  of  perceived  workload  they  experienced  during  the  dual-task  phase 
of  the  experiment.  Data  was  only  collected  after  the  dual  task  phase,  so  a  comparison 
across  task  phase  could  not  be  analyzed.  A  2  (age  group)  x  2  (facial  expression  condition) 
ANOVA  was  mn  to  determine  if  the  independent  variables  of  age  group  and  facial 
expression  condition  had  a  significant  effect  on  computed  workload.  There  was  no 
significant  main  effect  for  age  group  (F(l,  78)  =  Al,p  =  .68),  for  facial  expression 
condition  (F(l,  78)  =  2.41,  p  =  .13),  or  for  the  interaction  of  age  group  x  condition  (F(l, 
78)  =  1.64,  p  =  .21).  Neither  age  group  nor  facial  expression  condition  significantly 
affected  participants’  subjective  workload,  illustrated  by  Figure  17. 
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Figure  17.  Mean  computed  workload  by  age  group  for  sad  and  happy  facial  expression  conditions. 

In  sum,  the  results  of  the  analysis  of  task  phase  3  show  that  facial  expression 
decoding  accuracy  did  not  significantly  differ  as  a  function  of  task  phase,  but  the 
measures  of  intensity  key  pressed,  response  time,  and  face  misses  did  show  a  dual-task 
cost.  There  was  a  main  effect  of  task  phase  on  response  time  for  all  participants,  which 
showed  faster  response  times  in  phase  2  compared  to  phase  3.  A  main  effect  of  age  group 
showed  older  adults  to  be  significantly  slower  in  response  time  compared  to  younger 
adults.  There  was  also  a  main  effect  of  task  phase  on  the  amount  of  faces  that  were 
missed,  which  showed  more  faces  were  missed  in  phase  3  than  phase  2,  however  this  did 
not  differ  by  age  group  or  facial  expression  condition.  The  two-way  interaction  of  age 
group  X  task  phase  was  significant  and  showed  mean  intensity  key  pressed  significantly 
increased  for  older  adults  across  task  phase  compared  to  younger  adults. 
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DISCUSSION 


The  goal  of  the  current  study  was  to  investigate  whether  Chernoff  face  stimuli 
could  serve  as  ambient  (i.e.,  relatively  resource -free)  indicators  of  quantitative 
information,  using  a  dual-task  paradigm.  It  was  hypothesized  (Hi)  that  a  significant  three- 
way  interaction  would  occur  between  face  presented  x  age  group  x  facial  expression 
condition  for  decoding  performance  in  the  single-task  phase.  Both  age  groups  were 
expected  to  have  similar  decoding  accuracy  (i.e.,  similar  regression  slopes)  in  the  happy 
facial  expression  condition,  but  non-similar  slopes  in  the  sad  facial  expression  condition. 
This  age-related  difference  in  decoding  accuracy  as  a  function  of  facial  expressions  being 
happy  or  sad,  was  based  on  literature  indicating  positive  facial  expression  provided  a 
decoding  advantage  (Bartneck  &  Reichenbach,  2005;  Calvo  &  Lundqvist,  2008; 

Rellecke,  2011),  and  literature  that  suggested  older  adults  could  decode  positive  facial 
expressions  as  accurately  as  younger  adults  (Orgeta  &  Phillips,  2007). 

Hypothesis  1:  A  Three-Way  Interaction  of  Age  Group,  Facial  Expression  Condition,  and 
Face  Presented 

Hypothesis  1  was  not  fully  supported.  The  current  experiment  revealed  that  the 
interaction  between  face  presented  x  age  group  x  facial  expression  condition  for  decoding 
performance  in  the  single-task  phase  was  not  significant.  However,  it  was  found  that  the 
relationship  between  younger  and  older  adults’  decoding  accuracy  did  significantly 
change  due  to  facial  expression  condition.  There  was  an  age-related  difference  in 
decoding  accuracy  in  the  happy  face  condition.  Younger  adults’  significantly  higher 
decoding  accuracy  in  the  happy  facial  expression  condition  was  unexpected  due  to  the 
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“happy  face  advantage”  that  was  anticipated  for  older  adults  (Ekman  &  Friesen,  1975; 
Orgeta  &  Phillips,  2007;  Calvo  &  Lundqvist,  2008).  There  was  not  an  age-related 
difference  in  decoding  accuracy  in  the  sad  face  condition.  The  absence  of  an  age-related 
difference  in  decoding  accuracy  in  the  sad  facial  expression  condition  was  also 
unexpected.  The  similarity  of  decoding  accuracy  performance  between  younger  and  older 
adults  in  the  sad  face  condition  was  not  hypothesized,  and  may  be  evidence  of  the  lack  of 
a  negativity  effect  for  younger  adults,  which  was  based  on  previous  research  (Lynchard 
&  Radvansky,  2012). 

Participants’  (collapsed  across  age  group)  had  higher  decoding  accuracy  when 
they  were  presented  with  happy  facial  expressions.  This  finding  supports  a  general 
“happy  face  advantage”  across  age  group  and  suggests  that  when  compared  to  sad 
Chernoff  facial  expressions,  happy  Chemoff  facial  expressions  are  more  advantageous 
for  decoding.  In  terms  of  using  a  Chernoff  face  for  the  display  of  quantitative 
information;  the  use  of  happy  facial  expression  was  shown  to  be  an  overall  more 
decodable  stimuli.  This  finding  corroborates  with  previous  research  that  also  provides 
evidence  of  more  accurate  happy  face  decoding  (Hess,  1997).  While  this  finding  doesn’t 
fully  support  hypothesis  1 ,  it  does  add  support  to  the  general  hypothesis  that  happy 
Chernoff  faces  would  be  decoded  the  most  accurately  compared  to  sad  Chernoff  faces. 

Younger  adults  had  significantly  faster  response  times  compared  to  older  adults, 
regardless  of  the  facial  expression  condition.  This  was  not  expected  and  did  not  support 
the  hypothesis  that  happy  facial  expression  would  allow  older  adults  to  maintain  a  similar 
response  time  as  younger  adults  in  the  happy  facial  expression  condition  (i.e.,  happy  face 
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advantage).  Previous  research  showing  the  capacity  of  quick  decoding  for  happy  facial 
expressions  (Calvo  &  Lundqvist,  2008)  was  paired  with  the  socioemotional  selectivity 
theory  (Carstensen,  Issacowitz,  &  Charles,  1999)  to  reach  the  concept  of  older  adults 
decoding  happy  facial  expression  with  quickness.  Since  response  time  was  interpreted  as 
a  measure  of  attentional  demand  on  the  participant,  it  was  inferred  that  older  adults’ 
incurred  a  higher  attentional  demand  when  performing  the  facial  decoding  task.  The  non- 
main  effect  of  facial  expression  condition  showed  that  happy  and  sad  facial  expressions 
were  responded  to  with  similar  response  times  within  age  groups.  This  was  expected  for 
younger  adults  (i.e.,  no  decrement  in  response  time  due  to  facial  expression  condition), 
but  not  for  older  adults.  The  non-significant  difference  for  older  adults’  response  times  in 
terms  of  facial  expression  condition  indicates  no  response  time  advantage  for  either  facial 
expression. 

The  main  effect  of  facial  expression  condition  on  faces  missed  indicated 
participants  in  the  sad  facial  expression  condition  missed  significantly  more  faces  than 
participants  in  the  happy  facial  expression  condition.  This  supports  the  general  idea  that 
happy  faces  are  more  quickly  (i.e.,  perhaps  pre-attentively)  decoded  than  sad  faces.  This 
finding  partially  supports  hypothesis  1 .  It  was  expected  for  older  adults  to  miss 
significantly  more  sad  facial  expressions,  but  younger  adults  were  expected  to  see  no 
change  in  faces  missed  across  facial  expression  condition.  The  main  effect  of  facial 
expression  condition  showed  that  sad  Chemoff  faces  were  missed  significantly  more 
regardless  of  age  group.  However,  this  preliminary  finding  indicating  a  pre-attentive  or 
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resource-free  quality  of  happy  faces  was  more  thoroughly  investigated  in  phase  3,  where 
additional  attentional  demand  was  placed  on  the  participants. 

The  finding  of  participants’  significantly  higher  decoding  accuracy  for  happy 
facial  expressions  can  be  paired  with  participants’  lower  amount  of  misses  for  happy 
facial  expressions.  This  forms  a  case  that  happy  facial  expressions  are  generally  more 
easily  decodable  than  sad  facial  expressions,  which  is  consistent  with  previous  research 
(Hess,  1997;  Bartneck  &  Reichenbach,  2005;  Calvo  and  Lundqvist,  2008).  The  results 
yielded  from  the  testing  of  Hi  gave  evidence  that  happy  facial  expressions  have  a 
significant  advantage  for  decoding,  in  situations  of  low  attentional  demand.  However,  it 
is  important  to  remember  that  older  adults  performed  significantly  lower  than  younger 
adults  in  terms  of  decoding  accuracy  (when  collapsed  across  facial  expression  condition) 
and  response  time.  This  suggests  that  older  adults  had  difficulty  decoding  the  Chernoff 
facial  expressions.  Because  of  this  finding,  Chernoff  facial  expressions  ability  to 
transcend  age  group  as  a  type  of  ambient  display  is  suspect. 

An  aspect  of  the  current  study  that  may  have  contributed  to  the  absence  of  an 
older  adult  happy  face  advantage  (in  phase  2)  was  the  amount  of  intensity  levels  for  the 
variable  of  face  presented.  Unlike  previous  studies  (Hess,  1997;  Orgeta  &  Phillips,  2007), 
faces  in  the  current  study  changed  incrementally  by  10  %  on  a  scale  from  0  %  -  90  %. 
Thus,  we  may  have  increased  the  amount  of  discrimination  required  of  our  participants.  It 
was  shown  in  previous  research  that  10  %  intensity  level  steps  were  too  small  to  be 
discriminated,  and  participants  were  not  as  accurate  in  their  decoding  (Bartneck  & 
Reichenbach,  2005). 
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The  manipulation  of  only  one  facial  feature  may  not  have  been  optimal  for  facial 
expression  decoding  in  adults.  A  plausible  explanation  for  older  adults’  lower  decoding 
accuracy  was  the  simplistic  level  of  face  manipulation  used  on  the  Chernoff  faces  (i.e., 
only  the  mouth  was  manipulated).  Perceiving  slight  changes  in  mouth  curvature  of  the 
Chernoff  faces  may  have  been  too  difficult  a  task  for  older  adults.  A  previous  study 
suggested  that  children  (ages  1 1-12)  were  more  successful  at  recognizing  changes  in 
single  features  (e.g.,  mouth,  eyebrows)  than  adults  (ages  20-45)  (Tsurusawa,  Goto, 
Mitsudome,  Nakashima,  &  Tobimatsu,  2007).  This  was  due  to  the  lack  of  development  of 
holistic  facial  expression  decoding  in  children.  The  current  study  generalizes  this  finding 
to  older  adults  due  to  their  observed  lower  slope  value  in  facial  decoding  accuracy. 
Potentially,  the  ability  for  people  to  discern  slight  manipulations  of  a  single  facial  feature 
is  negatively  associated  with  age.  The  concept  of  a  “pseudo-Chernoff  face”,  which 
manipulated  only  one  facial  feature,  was  shown  to  be  difficult  for  older  adults  to  decode. 
Although  the  percentage  information  conveyed  by  the  Chernoff  face  was  univariate  in 
nature,  it  may  be  more  helpful  to  manipulate  multiple  facial  features  to  communicate 
such  information.  The  holistic  manipulation  of  a  face  (i.e.,  mouth,  eyes,  eyebrows,  etc.) 
could  provide  a  better  decoding  accuracy  for  both  younger  and  older  adults.  The  idea 
presented  by  Montello  and  Gray  (2005)  of  communicating  data  univariately  seems  to 
have  been  misapplied  to  facial  expression  in  the  current  study.  Unintentionally,  we  may 
have  created  a  more  difficult  decoding  task  by  manipulating  only  one  facial 
characteristic. 
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Hypothesis  2:  A  Four-  Way  Interaction  of  Age  Group,  Facial  Expression  Condition,  Face 
Presented,  and  Task  Phase 

It  was  hypothesized  (H2)  that  participants’  performance  across  age  groups  in  the 
dual-task  condition  would  not  significantly  decline  when  in  the  happy  facial  expression 
condition,  while  a  dual-task  cost  would  be  observed  in  the  sad  facial  expression 
condition.  This  expected  finding  was  linked  to  the  happy  face  advantage  used  as  a  basis 
for  hypothesis  1  (Ekman  &  Friesen,  1975;  Orgeta  &  Phillips,  2007;  Calvo  &  Lundqvist, 
2008). 

The  four- way  interaction  associated  with  hypothesis  2  was  not  supported,  and 
confirmed  that  the  three-way  interaction  of  face  presented  x  age  group  x  facial  expression 
condition  did  not  significantly  differ  across  task  phase.  Decoding  accuracy  in  the  dual¬ 
task  phase  was  statistically  similar  to  the  single  task  phase.  Every  interaction  that 
involved  decoding  accuracy  as  a  function  of  task  phase  yielded  non-significant  results. 
This  was  an  unexpected  finding  and  presents  a  question  as  to  why  there  was  no  dual-task 
cost. 

The  main  effect  of  task  phase  and  main  effect  of  age  group  on  response  time 
suggests  that  the  dual-task  phase  was  contributing  to  a  decrease  in  performance. 
Therefore,  the  prediction  that  happy  facial  expressions  do  not  produce  a  significant 
increase  in  response  time  was  not  supported.  The  happy  face  stimuli  used  in  our  study 
were  not  immune  to  dual-task  cost.  As  previous  research  has  stated,  (Morris,  1998; 
Whalen,  1998)  the  potential  advantage  of  using  a  face  as  an  ambient  display  is  the  face’s 
ability  to  not  add  any  cognitive  load  on  the  user,  specifically  in  an  attentional  demanding 


53 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


situation.  Response  time  data  has  shown  Chernoff  facial  expressions  do  not  meet  this 
requirement,  and  hence  may  not  be  good  ambient  displays.  The  main  effect  for  age  group 
suggested  that  older  adults  were  significantly  slower  at  decoding  facial  expressions.  The 
slower  response  time  for  older  adults  was  also  seen  in  the  single  task  phase. 

The  amount  of  misses  a  participant  incurred  was  significantly  different  based  on 
task  phase.  Participants  recorded  significantly  more  misses  on  average  (by  a  factor  of  4) 
in  the  dual-task  condition  than  the  single-task  condition.  Just  as  response  time  indicated  a 
dual-task  cost,  so  do  the  amount  of  misses  observed  for  participants.  This  finding  does 
not  fully  support  hypothesis  2.  Since  misses  significantly  increased  for  both  happy  and 
sad  facial  expressions,  there  was  no  apparent  happy  face  advantage.  The  significant  main 
effect  for  facial  expression  condition  shown  in  phase  2  (i.e.,  sad  faces  yielded  more 
misses)  was  not  shown  in  phase  3. 

Participants’  number  of  blocks  cleared  for  the  block  game  (in  the  dual -task  phase) 
was  significantly  different  based  on  age  group.  Younger  adults  cleared  more  blocks  than 
older  adults  when  completing  the  dual-task.  This  finding  suggests  that  younger  adults 
were  able  to  complete  the  primary  block  task  at  a  higher  level  than  older  adults.  There 
was  no  significant  main  effect  of  facial  expression  condition,  which  showed  participants 
did  not  significantly  differ  in  number  of  blocks  cleared  based  on  which  facial  expression 
condition  they  were  placed. 

One  potential  answer  to  the  question  of  no  dual-cost  for  decoding  accuracy  is  that 
the  primary  task  in  the  dual-task  phase  was  not  engaging  enough.  The  relationships  for 
the  two-way  interactions  observed  in  phase  2  may  not  have  significantly  changed  in 
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phase  3  because  participants’  were  not  being  exposed  to  a  high  attentional  demanding 
situation  (i.e.,  relative  to  phase  2).  However,  the  data  from  response  time  and  amount  of 
face  misses  provide  evidence  that  the  dual-task  condition  was  causing  dual-task  cost 
among  participants.  The  lack  of  dual-cost  for  decoding  accuracy  may  be  explained  by  the 
significant  difference  observed  between  decoding  accuracy  as  a  function  of  age  group  in 
phase  2.  Younger  adults  had  a  significantly  higher  decoding  accuracy  (collapsing  across 
facial  expression  condition)  than  older  adults  in  the  single-task  phase  (phase  2).  However, 
younger  and  older  adults  may  have  experienced  a  floor  effect  in  decoding  accuracy  that 
prevented  the  expected  significant  decrease  in  decoding  accuracy  (in  the  sad  facial 
expression  condition)  from  phase  2  to  phase  3.  This  indicates  that  participants’ 
significantly  lower  decoding  accuracy  for  sad  Chernoff  facial  expressions  might  not  be 
directly  due  to  the  additional  attentional  demand  of  phase  3,  but  is  due  to  the  general 
difficulty  of  decoding  the  sad  Chernoff  facial  expressions.  Similar  to  the  single  task 
phase,  the  facial  expression  stimuli  may  not  have  conveyed  emotion  clearly  enough 
(possibly  due  to  the  manipulation  of  only  one  facial  feature)  to  result  in  the  expected 
three-way  interaction  across  task  phase. 

One  possibility  for  the  consistent  slower  response  times  for  older  adults,  as 
previously  mentioned,  is  related  to  the  stimuli.  The  stimuli  were  potentially  more  difficult 
for  the  older  adults  to  decode.  This  detracts  from  the  universal  usability  (i.e.,  usable  for 
all  age  groups)  of  Chernoff  faces  as  a  method  for  communicating  information.  A  second 
possibility  is  that  the  input  of  decoding  facial  expression  was  more  physically  taxing  for 
the  older  adults.  Using  the  number  pad  may  have  been  a  difficult  input  for  older  adults 
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who  have  joint  disorders  (e.g.,  arthritis)  or  other  physical  aliments.  A  more  novel  input 
mode  (e.g.,  speech)  may  provide  a  way  to  avoid  the  confounding  variable  of  input 
mechanism. 

When  looking  at  the  response  time  and  face  misses  data,  there  is  an  underlying 
concept  pertaining  to  Chernoff  faces  that  may  explain  the  dual-task  cost.  Previous 
research  claimed  that  Chernoff  faces  were  not  processed  in  parallel  and  were  more 
difficult  to  decode  (Morris,  Ebert,  Rheingans,  2000).  The  concept  that  Chernoff  faces  are 
not  pre-attentive  and  are  processed  serially  adds  support  to  the  dual-task  cost  seen  in  the 
current  study. 

The  age-related  effect  found  for  the  number  of  blocks  cleared  gave  evidence  that 
younger  adults  became  better  adapted  to  the  dual-task  phase  than  older  adults.  The 
proficiency  shown  by  younger  adults  in  the  block  task  could  help  explain  why  there  was 
a  younger  adult  advantage  for  decoding  accuracy  in  the  dual-task  phase.  Older  adults’ 
significantly  lower  decoding  accuracy  in  the  dual-task  could  be  attributed  to  the  difficulty 
of  the  block  task.  The  cognitive  demands  of  the  block  task  may  have  caused  older  adults 
to  experience  a  significant  performance  decrement  when  compared  to  younger  adults,  in 
both  the  number  of  blocks  cleared  and  decoding  accuracy.  Due  to  the  lack  of  an  effect  of 
facial  expression  condition,  it  can  be  inferred  that  the  happy  face  advantage  shown  in  the 
dual -task  was  not  due  to  participants’  inappropriate  allocation  of  attention  in  the  dual¬ 
task.  Essentially,  participants’  higher  decoding  accuracy  in  the  happy  face  condition  was 
not  due  to  their  neglect  of  the  primary  task. 
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In  sum,  the  results  gained  from  the  comparison  of  performance  measures  across 
task  phase  indicated  attention-demanding  environments  degrade  the  decoding  of  Chenoff 
faces.  While  decoding  accuracy  performance  did  not  show  a  dual-task  cost,  response  time 
and  amount  of  face  misses  revealed  a  significant  dual-task  cost.  Based  on  decoding 
accuracy  performance,  happy  facial  expression  appear  to  be  more  beneficial  than  sad 
facial  expression  in  an  attention-demanding  environment.  Even  though  the  happy  facial 
expression  condition  shows  significantly  higher  decoding  accuracy,  it  is  not  immune  to 
dual-task  cost  in  terms  of  response  time  and  the  amount  of  misses  incurred.  Younger 
adults  experienced  less  decrement  in  overall  performance  compared  to  older  adults  in  the 
dual-task.  Results  from  the  number  of  blocks  cleared  by  participants  in  the  dual-task 
phase  showed  younger  adults  out  performed  older  adults  on  the  primary  task.  The  block 
game  appeared  to  be  more  cognitively  demanding  for  older  adults,  which  may  have  led  to 
lower  decoding  accuracy.  The  dual-task  cost  seen  for  response  time  and  face  misses 
indicated  that  Chernoff  facial  expressions  create  a  significant  demand  on  users’  attention. 
Therefore,  Chernoff  faces  do  not  have  an  observed  benefit  for  communicating 
information  in  a  resource-free  manner. 

There  were  a  few  limitations  to  this  study  that  could  be  improved  upon  in  future 
research.  The  facial  expressions  stimuli  could  have  been  manipulated  to  take  advantage 
of  more  facial  features  when  conveying  expression.  Future  studies  could  measure 
decoding  performance  for  Chernoff  faces  with  variations  of  manipulated  facial 
characteristics  (e.g.,  manipulation  of  mouth  and  eyes,  versus  manipulation  of  mouth, 
eyes,  and  eyebrows).  Another  limitation  was  only  having  participants  complete  a  NASA- 
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TLX  survey  after  the  dual  task  phase.  It  would  be  beneficial  to  have  participants 
complete  the  NASA-TLX  survey  after  the  single-task  as  well.  This  would  allow  for 
comparison  of  subjective  workload  between  task  phases  in  an  effort  to  gain  another 
measure  of  dual-task  cost.  A  trust  rating  measure  was  not  included  in  the  current  study, 
but  could  be  in  a  future  study  as  a  measure  of  subjective  trust  concerning  the  facial 
expressions.  It  would  be  interesting  to  observe  how  a  participants’  trust  is  affected  by  the 
independent  variables  of:  age,  facial  expression  intensity,  and  facial  expression  condition. 
Understanding  which  faces  receive  significantly  different  trust  ratings  would  add  an 
interesting  element  to  a  future  study.  Another  improvement  for  the  current  study  involves 
the  placement  of  the  Chernoff  face  in  the  computer  program.  The  peripheral  position  of 
the  Chernoff  face  may  have  put  participants  at  a  disadvantage  for  decoding.  A  future 
study  may  place  the  facial  expression  in  a  more  centralized  location.  A  final  improvement 
could  be  to  add  more  facial  expression  conditions.  Previous  literature  has  expressed  an 
“anger  superiority”  effect  (Ohman,  Lundqvist,  Esteves,  2001),  which  could  be 
investigated  using  Chernoff  facial  expressions. 

CONCLUSION 

The  results  of  this  study  suggest  that  Chernoff  faces  communicate  facial 
expression  more  effectively  when  happy  facial  expressions  are  used.  However,  older 
adults  have  more  difficulty  in  decoding  Chernoff  facial  expressions.  There  is  also  a  dual¬ 
task  cost  for  the  decoding  of  Chernoff  faces  in  terms  of  increased  response  time  and  a 
higher  amount  of  faces  missed.  The  ability  for  Chernoff  faces  to  act  as  effective  ambient 
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displays  was  not  supported  by  this  study,  but  more  research  on  Chernoff  faces  should  be 
conducted  to  further  explore  their  usefulness  in  communicating  information. 
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APPENDIX  A 


Screenshot  of  Block  Game  Task  (Phase  1 ) 


Instructions  for  Task  1 


First,  you  will  get  some  practice  on  a 
game. 

In  this  game,  you  must  match  at 
least  three  blocks  vertically  or 
horizontally  of  the  same  color.  But 
you  can  only  switch  any  two  blocks 
horizontally. 

Use  the  cursor  keys  (up,  down,  left, 
right)  to  move  your  selector. 

Press  the  space  bar  to  switch  blocks. 

Please  work  as  quickly  as  you  can  to 
Increase  your  score.  This  part  of  the 
study  will  end  automatically. 

Click  "Start  practice"  to  begin. 
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APPENDIX  B 


Screenshot  of  Facial  Expression  Decoding  Task  (Phase  2) 


Instructions  for  Task  2 


That  is  th«  and  of  your  gam* 
practica.  Do  you  hava  any 
quostiotts? 

Now  wa  will  mova  to  your  othar  task 
practica.  On  tha  right  sida  of  tha 
scraan  you  will  taa  a  faca  appaar  In 
tha  whita  box. 

Whan  you  saa  tha  faca  you  should 
idantify  tha  laval  of  emotional 
axpranlon  on  the  prasantad  faca. 

You  will  use  the  kays  from  0  to  9  to 
Irsdlcata  no  axpraulon  (kay  0}  to 
high  axprasslon  {kay  9)  and  any  In* 

You  should  usa  your  own 
judgmartt-'thara  ara  no  right  or 
wrong  arrswars. 

Whan  you  ara  ready  to  begirt,  please 
click  "Start  practica". 
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APPENDIX  C 


Chernoff  Facial  Expression  Stimuli  Organized  by  Expression  and  Intensity 


Neutral  Facial  Expression 


%0 


Happy  Facial  Expressions 


%  10  20  30  40  50  60  70  80  90 


Sad  Facial  Expressions 


%  10  20  30  40  50  60  70  80  90 
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APPENDIX  D 


Screenshot  of  Block  Task  and  Facial  Expression  Decoding  Task  (Phase  3) 


Instructions  for  Task  3 

Now.  you  will  do  both  tasks  at  th« 
sam*  tim*.  That  Is.  you  will  hav* 
th«  blocks  gamt  and  tha  faca 
Idantificatlon  task  occurlng  at  tha 
sama  tima. 

Lika  bafora,  you  will  control  tha 
blocks  gama  by  using  tha  cursor 
kays  (up.  down,  laft  right)  and  tha 
tpaca  bar  to  switch  any  two  blocks 
horizontally. 

You  will  also  idantify  tha  taval  of 
amotion  axprassad  on  tha 
prasantad  faca  in  tha  far  right. 

Lika  bafora.  you  will  usa  tha  kays 
from  0  to  9  to  indicata  no 
axprassion  (kay  0)  to  high 
axprassion  (kay  9)  ai>d  any  in* 
batwaait. 

Doing  thasa  two  tasks  at  tha  sama 
tima  Is  vary  challanging.  Your 
main  focus  should  ba  tha  blocks 
gama.  You  should  try  to 
maximiza  your  Kora  as  quickly  as 
possibla. 

Any  rasarva  attention  you  have 
availabla  should  ba  usad  for  tha 
faca  idantificatlon  task. 

Do  you  hava  any  questions? 
blaasa  ask  tha  axparlmantar  new. 

If  you  ara  ready,  plaasa  click  "Start 
axparimant". 


Score;  3 
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Abstract 

The  purpose  of  the  current  study  is  to  examine  the  extent  to  which  the  appearance,  task,  and 
reliability  of  a  robot  is  susceptible  to  stereotypic  thinking.  Stereotypes  can  influence  the  types  of 
causal  attributions  that  people  make  about  the  performance  of  others.  Just  as  causal  attributions 
may  affect  an  individual’s  perception  of  other  people,  it  may  similarly  affect  perceptions  of 
technology.  Stereotypes  can  also  influence  perceived  capabilities  of  others.  That  is,  in  situations 
where  stereotypes  are  activated,  an  individual’s  perceived  capabilities  are  typically  diminished. 
The  tendency  to  adjust  perceptions  of  capabilities  of  others  may  translate  into  levels  of  trust 
placed  in  the  individual’s  abilities.  A  cross-sectional  factorial  survey  using  video  vignettes  will 
be  utilized  to  assess  young  adults’  and  older  adults’  attitudes  toward  a  robot’s  behavior  and 
appearance.  We  hypothesize  that  a  robot’s  older  appearance  will  result  in  lower  levels  of  trust, 
more  dispositional  attributions,  and  lower  perceptions  of  capabilities  while  high  reliability  should 
positively  impact  trust. 
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Investigating  older  adults’  trust,  causal  attributions,  and  perception  of  capabilities  in  robots  as  a 
function  of  robot  appearance,  task,  and  reliability 

When  interacting  with  technology,  people  focus  on  human-like  qualities  of  the 
technology  more  than  the  asocial  nature  of  the  interaction  (Reeves  &  Nass,  1996;  Nass  &  Moon, 
2000)  attributing  human-like  qualities  such  as  personality,  mindfulness,  and  social  characteristics. 
The  attribution  of  human-like  qualities  makes  technology  susceptible  to  stereotyping  based  on 
appearance  and  etiquette  (e.g.,  Nass  &  Eee,  2001;  Parasuraman  &  Miller,  2004;  Eyssel  & 
Kuchenbrandt,  2012).  Eor  example,  when  a  male  or  female  anthropomorphic  computerized  aid 
was  included  in  a  trivia  task,  participants  were  more  likely  to  trust  the  male  aid’s  suggestions  and 
ranked  the  female  aid  as  less  competent  (Eee,  2008). 

The  purpose  of  the  current  study  is  to  examine  the  extent  to  which  the  appearance,  task, 
and  reliability  of  a  robot  is  be  susceptible  to  stereotypic  thinking.  The  theoretical  relevance  is 
that  the  results  of  this  study  will  inform  the  limits  of  stereotypic  thinking  by  investigating 
whether  stereotypes  are  applied  to  robots.  The  practical  relevance  is  that  the  current  study  may 
inform  the  design  of  robots  to  enhance  human-robot  interaction,  particularly  for  older  adults  who 
tend  to  be  less  accepting  of  technological  aids  than  other  age  groups  (Czaja  et  ak,  2006). 
Stereotypes  and  Aging 

In  order  to  make  efficient  social  judgments  about  others,  individuals  rely  on  the  use  of 
heuristics.  One  example  heuristic  involves  placing  an  individual  into  a  pre-determined  schema 
(i.e.,  a  stereotype).  Stereotypes  are  cognitive  shortcuts  that  result  in  impressions  of  others  (e.g., 
Ashmore  &  Del  Boca,  1981).  Therefore,  older  adults  may  be  more  likely  than  younger  adults  to 
apply  stereotypes  when  they  do  not  have  other  sources  of  information  available  to  them  (i.e., 
under  situations  of  ambiguity). 
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Stereotypes  are  more  likely  to  be  activated  in  domains  that  are  inconsistent  with 
prescriptive  societal  gender  or  age  roles  (e.g.,  Kuchenbrandt,  Haring,  Eichberg,  Eyssel,  &  Andre, 
2014).  Eor  example,  individuals  perceived  a  female-voiced  computer  to  be  more  informative 
about  romantic  relationships  than  the  male-voiced  computer  (Nass,  Moon,  &  Green,  1997). 
Although  gender  stereotypes  have  been  studied  using  anthropomorphic  technological  aid 
paradigms,  aging  stereotypes  have  been  investigated  to  a  lesser  degree  within  this  context.  Pak, 
McEaughlin,  &  Bass  (2014)  examined  whether  the  physical  appearance  of  an  anthropomorphic 
aid  would  activate  stereotypic  thinking  and  affect  individuals’  trust  in  the  aid.  Using  a  factorial 
design,  Pak  et  al.  manipulated  the  technological  aid’s  gender  and  age  (younger,  older)  as  well  as 
participants’  perceptions  of  the  reliability  of  the  automation.  Participants  were  told  that  the 
automation  was  either  45%,  70%,  or  95%  reliable.  However,  the  automation  always  provided  a 
correct  answer  during  testing.  The  task  in  this  study  was  a  health  behaviors  test  regarding 
participants’  knowledge  about  diabetes.  Before  beginning  the  task,  participants  were  told  that  the 
automated  aid  was  a  Smartphone  application  recommended  by  a  doctor  designed  to  help  people 
make  the  best  decisions  about  diabetes.  As  the  participants  answered  each  question,  the  decision 
aid  smart  phone  app  would  appear  on  the  screen  and  the  agent  would  recommend  a  correct 
answer.  All  of  the  agents  were  dressed  as  doctors.  Participants  rated  their  subjective  trust  in  the 
automation  and  whether  they  would  actually  use  the  advice  of  the  application  on  a  1-7  Eikert 
scale. 

Pak,  McEaughlin,  &  Bass  (2014)  found  that  both  younger  and  older  adult  participants 
trusted  the  older  anthropomorphic  aids  more  than  the  younger  aids,  the  male  aids  more  than  the 
female  aids,  and  more  reliable  applications  than  less  reliable  applications.  However,  stereotypic 
thinking  was  activated  when  perceptions  of  reliability  were  low  or  ambiguous.  When  the  app  had 
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low  reliability,  the  younger  female  aid  was  trusted  less  than  younger  male  agents.  Also,  under 
medium  reliability,  the  older  female  aid  was  trusted  less  than  the  older  male  aid.  These  results 
suggest  that  trust  in  automation  can  be  influenced  by  physical  appearance  (i.e.,  gender  and 
perceived  age)  of  the  technology.  These  results  also  further  support  the  notion  that  technology  is, 
like  humans,  also  susceptible  to  stereotyping. 

Physical  appearance  is  known  to  play  a  large  role  in  the  activation  of  aging  stereotypes. 
The  link  between  physical  characteristics  and  stereotypes  has  been  well  established  in  the  social 
cognition  literature  (Brewer  &  Eui,  1984;  Hummert,  1994;  Hummert,  Garstka,  &  Shaner,  1997). 
Within  this  context,  facial  features  are  considered  to  be  the  main  source  of  information  used  in 
order  to  activate  stereotypes.  Hummert  et  al.  (1997)  found  that  negative  age  stereotypes  were 
associated  with  the  perception  of  advanced  age  through  facial  photographs.  Overall,  these 
findings  suggest  that  physical  cues  are  major  indicators  within  the  context  of  social  judgments. 

Stereotypes  about  older  adults,  although  pervasively  negative,  can  be  multidimensional  in 
the  right  context.  People  hold  both  positive  and  negative  stereotypes  about  older  adults 
(Hummert,  1993).  When  adults  of  all  ages  completed  a  trait  card-sorting  task  where  they  were 
asked  to  generate  traits  they  associated  with  older  adults,  Hummert  and  colleagues  (1994)  found 
approximately  10  different  aging  stereotypes,  including  positive  ones  like  the  “golden  ager”  who 
leads  an  active  and  engaged  lifestyle.  Although  many  stereotypes  are  held  in  common  by  people 
of  all  ages,  aging  stereotypes  tend  to  become  increasingly  differentiated  as  people  grow  older 
(Hummert,  1993;  Hummert  et  al.,  1994). 

Stereotypes  and  other  social  beliefs  can  influence  the  way  in  which  individuals  process 
information  in  order  to  form  social  judgments,  including  the  types  of  causal  attributions  that 
people  make  about  the  performance  of  others  (Eiske  &  Taylor,  1991).  When  trying  to  determine 
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the  causality  of  an  event,  people  tend  to  use  two  types  of  information:  internal  or  dispositional 
qualities  of  the  individuals  involved  in  an  outcome  and  the  influences  of  the  situation  itself 
(Gilbert,  1993;  Krull,  1993;  Krull  &  Erikson,  1995).  Potential  biases  in  the  attribution  process 
can  occur  as  a  function  of  the  valence  of  the  situational  outcome,  the  degree  of  ambiguity  of  the 
situation  (or  of  the  information  given  about  causal  factors),  and  the  controllability  of  the  situation 
(Blanchard-Eields,  1994).  Blanchard-Eields  suggested  that,  in  general,  older  adults  are  most 
likely  to  make  dispositional  attributions  when  the  outcome  of  a  situation  was  negative  and  the 
actor’s  role  in  the  outcome  was  ambiguous.  When  personal  beliefs  about  another  individual  or 
situation  are  violated,  older  adults  are  also  more  likely  to  make  to  make  dispositional  attributions 
of  blame  rather  than  situational  (Blanchard-Eields,  1996;  Blanchard-Eields,  Hertzog,  &  Horhota, 
2012).  Just  as  causal  attributions,  or  the  extent  to  which  behavior  is  attributed  to  situational  or 
dispositional  causes,  may  affect  an  individual’s  perception  of  other  people,  it  may  also  similarly 
affect  perceptions  of  technology.  Eor  example,  blaming  technology  for  unreliable  performance  is 
likely  to  induce  less  trust  (Moray,  Hiskes,  Eee,  and  Muir,  1995;  Madhavan,  Wiegmann,  & 
Eacson,  2006).  Attribution  of  fault  has  been  studied  in  the  automation  and  has  been  referred  to 
as  automation  bias  (Mosier  &  Sitka,  1996).  Automation  bias  has  been  defined  “as  a  heuristic 
replacement  for  vigilant  information  seeking  and  processing”  (Mosier  &  Sitka,  p.  202)  which 
results  in  increased  omission  errors  and  commission  errors. 

Expectations  of  performance  outcomes  are  influenced  by  stereotypes.  Adults  of  all  ages 
expect  memory  performance  to  decline  with  age  (Eineweaver  and  Hertzog,  1998).  Similarly, 
older  adults’  abilities  are  perceived  negatively  in  domains  involving  memory  (Kite  &  Johnson, 
1988;  Kite,  Stockdale,  Whitley  &  Johnson,  2005)  and  physical  well-being  (Davis  &  Eriedrich, 
2010).  In  memory  taxing  situations,  older  adults  are  perceived  as  being  less  credible  and  less 
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accurate  (Muller-Johnson,  Toglia,  Sweeney,  &  Ceci,  2007).  The  tendency  to  adjust  perceptions 
of  capabilities  of  others  based  on  appearance  may  translate  into  levels  of  trust  placed  in  the 
individual’s  abilities. 

Trust  in  Automation 

Trust  in  technological  agents  is  important  because  it  affects  an  individual’s  willingness  to 
accept  robot’s  input,  instructions,  or  suggestions  (Eussier,  Gallien,  &  Guiochet,  2007).  Eor 
example,  Muir  and  Moray  (1996)  found  a  strong  positive  relationship  between  adults’  level  of 
trust  in  an  automated  system  and  the  extent  to  which  they  allocated  control  to  the  automated 
system.  Interestingly,  Muir  (1987)  suggests  that  people’s  trust  in  technology  is  affected  by 
factors  that  are  also  the  basis  of  interpersonal  trust.  Trust  in  automation  is  thought  to  develop 
overtime  (Maes,  1994)  suggesting  that  trust  is  influenced  by  past  experiences  with  the 
technology.  Eor  example,  Merritt  and  Ilgen  (2008)  describe  dispositional  trust  as  the  trust  placed 
in  a  person  or  automation  during  a  first  encounter  before  any  interaction  has  been  made  while 
history  based  trust  reflects  the  prior  experience  a  person  has  with  another  person  or  automation. 

Performance  based  factors  have  a  large  influence  in  perceived  trust  in  HRI  (Brule,  Dotsch, 
Bijlstra,  Wigboldus,  &  Haselager,  2014).  In  fact,  a  recent  meta-analysis  suggests  that  a  robot’s 
task  performance  was  the  most  important  factor  in  adults’  trust  in  robots  (Hancock  et  ak,  201 1). 
That  is,  if  the  robot  performs  reliably,  the  human  will  exhibit  greater  trust  towards  the  robot.  The 
same  meta-analysis  found  that  behavior,  proximity,  and  size  of  the  robot  also  affect  trust  to  a 
lesser  extent.  However,  human-automation  trust  literature  suggests  that  appearance  can  have 
reliable  effects  on  trust  (Pak  Pink,  Price,  Bass,  &  Sturre,  2012).  Indeed,  studies  in  the  social 
literature  have  found  that  people  often  judge  an  individual’s  levels  of  trustworthiness  based  on 
facial  appearance  (Oosterhof  &  Todorov,  2008)  and  that  trust  judgments  can  be  formed  after 
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only  a  brief  exposure  (100  ms)  to  a  face  (Willis  &  Todorov,  2006).  It  is  also  important  for  the 
robot’s  appearance  to  be  compatible  with  its  function  at  face  value.  Goetz,  Kiesler,  &  Powers 
(2003)  found  that  people  are  more  likely  to  accept  a  robot  when  its  appearance  matches  its 
perceived  capabilities.  This  is  thought  to  be  the  case  because  when  there  is  a  high  level  of 
compatibility  between  appearance  and  functionality,  users  expectations  are  confirmed,  boosting 
confidence  in  the  robot’s  performance.  However,  when  appearance  and  capabilities  are 
incompatible,  user  expectations  are  violated,  which  can  result  in  lower  levels  of  trust  (Duffy, 
2003). 

Because  studies  of  human  robot  interaction  are  a  new  field,  there  are  many  gaps  in  the 
literature  especially  regarding  the  social  influences  on  HRI.  Eirst,  although  there  is  evidence  to 
suggest  that  stereotypes  can  affect  performance  and  interactions  with  anthropomorphized 
technological  aids,  we  do  not  know  how  pre-existing  age  stereotypes  will  affect  HRI.  Next,  it  is 
unclear  how  trust  might  be  moderated  by  task  type  and  reliability.  Although  the  automation 
literature  suggests  that  reliability  can  influence  trust,  to  our  knowledge  the  relationship  between 
robot  task  domain  and  trust  has  not  yet  been  investigated.  Einally,  how  does  stereotyping 
technology  affect  perception  of  capabilities  and  the  causal  attributions  made  about  performance? 
The  Current  Study 

The  purpose  of  this  study  is  to  better  understand  the  factors  that  influence  older  adults' 
trust  in  robots.  Specifically,  we  are  investigating  whether  the  robots’  appearance,  task  domain, 
and  reliability  of  the  robot’s  performance  influence  trust  in  the  automation.  A  cross-sectional 
factorial  survey  study  will  be  utilized  using  video  vignettes  to  assess  participants’  attitudes 
towards  the  robots’  behavior  and  appearance.  Each  vignette  will  include  manipulations  of  the  age 
of  the  robot,  the  domain  of  the  collaborative  task,  and  the  reliability  of  the  robot’s  performance. 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


AGE  STEREOTYPES  IN  HRI 


9 


Dependent  measures  will  include  the  level  of  trust  participants  exhibit  toward  the  robot,  causal 
attributions  regarding  the  robot’s  performance,  and  perceived  capabilities  of  the  robot. 

It  is  hypothesized  that  manipulating  a  robot’s  appearance,  level  of  reliability,  and  the  task 
type  will  have  an  effect  on  the  level  of  trust  that  an  older  adult  exhibits  toward  a  robot,  the  causal 
attributions  that  the  individual  makes  about  the  robot’s  performance,  and  people’s  perceptions  of 
the  capabilities  of  the  robot.  Specifically,  trust  in  the  robot  should  be  highest  when  the  task  is 
stereotypically  congruent  with  the  robot’s  appearance  (e.g.,  a  younger  adult  performing  a 
cognitive  task  instead  of  an  older  adult  performing  a  cognitive  task)  and  its  performance  is 
reliable.  This  is  hypothesized  because  appearance  influences  people’s  trust  in  automation  (Pak, 
Eink,  Price,  Bass,  &  Sturre,  2012)  and  aging  stereotypes  will  less  likely  be  activated  while 
interacting  with  the  younger  robot.  The  attributions  about  the  robot’s  performance  may  be  more 
dispositional  when  reliability  is  low  and  the  task  is  incongruent  with  the  robot’s  appearance.  This 
is  because  older  adults  are  more  likely  to  make  dispositional  (i.e.,  internal)  attributions  of  blame 
when  an  outcome  of  an  event  is  perceived  as  negative  (the  unreliable  condition)  and  when  their 
beliefs  are  violated  (i.e.,  when  an  older  looking  robot  performs  the  cognitive  and  physical  tasks; 
Blanchard-Eields,  Hertzog,  &  Horhota,  2012).  Perceived  capabilities  of  the  robot  are 
hypothesized  to  depend  on  the  robot’s  appearance.  That  is,  capability  ratings  are  expected  to  be 
higher  when  the  younger  looking  robot  performs  the  tasks,  and  rankings  are  expected  to  be  lower 
when  an  older  looking  robot  performs  the  tasks.  This  is  expected  because  adults’  capabilities  in 
cognitive  and  physical  domains  are  expected  to  decline  with  age  (Kite,  Stockdale,  Whitley,  & 
Johnson,  2005;  Davis  &  Eriedrich,  2010).  Task  domain  will  be  treated  as  an  exploratory  variable. 
However,  based  on  automation  trust  literature  suggesting  that  trust  in  robot’s  capabilities  might 
depend  on  the  domain  in  which  they  are  placed  (e.g.,  industry,  entertainment,  social;  Schaefer, 
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Sanders,  Yordon,  Billings,  &  Hancock,  2012),  it  is  hypothesized  that  there  will  be  a  main  effect 
of  task  domain  such  that  participants  will  have  more  trust  in  the  robot  and  have  higher  ratings  of 
perceived  capabilities  when  the  robot  performs  physical  tasks. 

Method 

Participants 

50  younger  adults  and  50  older  adults  will  complete  the  study.  Younger  adults  will  be 
undergraduate  students  who  receive  extra  credit  for  participation.  Older  participants  will 
normatively  aging  older  adults  recruited  from  the  community  and  will  receive  $15  for  their 
participation. 

Measures 

Individual  Difference  Measures.  Demographic  information,  vocabulary  (Shipley 
vocabulary;  Shipley,  1986),  perceptual  speed  (digit-symbol  substitution;  Wechsler,  1997),  and 
working  memory  (automated  operation  span;  Unsworth,  Heitz,  Schrock,  &  Engle,  2005)  will  be 
measured.  The  Complacency  Potential  Rating  Scale  (CPRS;  Singh,  Molloy,  and  Parasuraman, 
1993)  is  designed  to  measure  complacency  towards  different  types  of  automation.  Participants 
will  respond  to  the  extent  they  agree  with  statements  about  automation  on  a  scale  of  1-5. 

Subjective  Trust.  Trust  will  be  measured  by  asking  the  participants  how  much  they 
trusted  the  robot  portrayed  in  the  vignette.  Responses  will  be  recorded  on  a  Eikert  scale  from  1 
(not  at  all)  to  7  (very  much).  The  larger  the  participants’  ratings,  the  higher  their  subjective  trust 
in  the  robot. 

Causal  Attributions.  Causal  attributions  will  be  measured  using  a  paradigm  adapted 
from  Blanchard-Eields,  Chen,  Schocke,  and  Hertzog  (1998).  Participants  will  be  asked  to 
indicate  the  degree  to  which  either  dispositional  factors  of  the  characters  or  situational  factors 
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influenced  the  outcome  of  the  scenario.  Specifically,  participants  indicated  the  extent  to  which: 
(a)  the  robot  was  responsible  for  the  final  outcome,  (b)  the  robot  was  to  blame  for  the  final 
outcome,  (c)  the  final  outcome  was  due  to  personal  characteristics  of  the  robot,  (d)  the  final 
outcome  was  due  to  characters  in  the  story  other  than  the  robot,  (e)  the  final  outcome  was  due  to 
something  other  than  the  characters  in  the  story,  and  (f)  both  the  personal  characteristics  of  the 
robot  and  something  other  than  the  robot  contributed  to  the  final  outcome.  Participants  will 
respond  using  a  Eikert  scale  from  1  (very  little)  to  7  (very  much).  In  order  to  classify  the  extent 
to  which  participants  attributed  performance  to  either  dispositional  or  situational  variables,  we 
will  sum  the  responses  from  a-c,  which  represent  dispositional  attributions  of  performance  and 
compare  them  with  participant’s  summed  responses  to  d-f,  which  represent  situational 
attributions  of  the  final  outcome.  The  higher  the  score  on  these  two  aspects,  the  higher  the  degree 
of  either  dispositional  attributions  or  situational  attributions. 

Perceived  Capabilities,  Perceived  capabilities  of  the  robot  will  be  measured  by  using  a 
list  of  questions  that  span  potential  capabilities.  Participants  will  be  asked,  “Based  on  the  robot’s 
behavior  in  the  video  you  just  watched,  what  other  activities  could  the  robot  complete?” 
Participants  will  be  asked  about  further  cognitive  capabilities  or  motor  capabilities  of  the  robot. 
That  is,  participants  will  rank  their  agreement  regarding  whether  the  robot  could  complete 
similar  cognitive  or  physical  tasks.  Eor  example,  participants  could  be  asked,  “Based  on  the 
robot’s  performance,  could  it  also  recommend  stock  investment  picks?”  or  “Based  on  the  robot’s 
performance,  could  it  also  vacuum  a  room?”  Afterward,  participants  will  be  asked  to  write  a 
short  answer  explaining  what  other  tasks  they  thought  the  robot  could  do.  Participants  will  rate 
the  extent  to  which  they  think  the  robot  could  perform  certain  tasks  on  a  1-7  Eikert  scale  ranging 
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from  “Definitely  No”  to  “Definitely  Yes”  with  higher  scores  indicating  increased  perceptions  of 
capabilities. 

Factorial  Survey.  In  a  factorial  survey,  independent  variables  (i.e.,  factors  or 
dimensions)  are  treated  as  statistically  independent,  making  it  possible  to  identify  and  separate 
their  influences  on  judgments  (Rossi  &  Anderson,  1982).  In  the  current  study,  the  dimensions 
will  include  the  robot’s  age  appearance  (younger,  older),  task  domain  (cognitive,  physical)  with 
two  tasks  per  domain,  and  aid  reliability  (low,  high).  The  levels  of  the  dimensions  will  result  in 
12  factorial  combinations  or  scenarios.  Each  scenario  will  be  presented  twice,  creating  24 
vignettes. 

The  stimuli  for  the  robots  were  selected  to  portray  a  younger  adult  (Eigure  1)  and  an  older 
adult  (Eigure  2).  Because  the  current  study  will  not  manipulate  the  gender  of  the  robot,  the  facial 
stimuli  for  both  the  younger  and  older  condition  will  be  female.  In  order  to  control  for  potential 
effects  for  different  faces,  the  faces  selected  for  this  study  represent  an  age  progression  of  the 
same  female. 

The  robot  used  in  this  study  will  be  the  Baxter  robot  manufactured  by  Rethink  Robotics. 
Baxter  is  designed  as  a  manufacturing  robot  that  can  complete  tasks  that  involve  assembly  and 
object  organization  (Gear  &  Gadgets,  2014).  Adobe  Photoshop  CC  will  be  used  to  superimpose 
the  facial  stimuli  onto  the  robot  (Eigure  3). 

Each  video  vignette  will  contain  a  slideshow  of  pictures  portraying  a  human  and  a  robot 
completing  a  collaborative  task.  The  opening  scenes  will  include  a  wide  shot,  introducing  the 
positioning  of  the  human  and  robot  as  well  as  the  collaborative  task.  In  order  to  avoid  any  age  or 
gender  biases  of  the  human  actor,  only  the  actor’s  arms  and  hands  will  be  shown  while  aiding  in 
the  collaborative  task.  The  next  shot  will  be  a  close  up  of  the  robot’s  trunk,  arms,  and  face. 
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Einally,  the  human  and  the  robot  will  complete  the  task.  The  final  shot  will  include  information 
about  whether  the  task  was  performed  reliably.  If  the  task  was  performed  reliably,  the  final  shot 
will  show  the  successfully  completed  task.  If  the  task  was  not  performed  reliably,  the  final  shot 
will  show  the  final  outcome  being  incorrectly  completed  or  unfinished.  As  a  manipulation  check, 
participants  will  be  asked  to  respond  to  the  question,  “Was  the  task  portrayed  in  the  slideshow 
completed  successfully?”  after  viewing  the  slideshow. 

During  the  survey,  each  video  vignette  will  be  presented  in  the  center  of  the  screen.  After 
participants  view  the  video,  the  questions  and  rating  scales  will  appear  in  the  lower  half  of  the 
screen.  Scenarios  will  be  presented  in  a  random,  counterbalanced  order.  The  survey  will  be 
programmed  into  the  online  survey  program  Qualtrics  for  administration. 

Design  and  Procedure 

The  study  was  a  2  (participant  age:  younger,  older)  X  2  (robot  age:  young,  old)  X  2  (task 
domain:  cognitive,  physical)  X  2  (robot  reliability:  low,  high)  mixed-model  design,  with 
participant  age  as  a  between-subjects  variable.  The  within-subjects  factors  are  manipulated  in  the 
factorial  survey.  The  task  domain  dimension  has  two  levels:  cognitive  and  physical.  These  levels 
were  selected  in  order  to  encompass  the  range  of  task  domains  within  the  HRI  literature.  Within 
those  two  domains,  participants  will  view  the  robots  doing  two  separate  tasks.  That  is,  the  robots 
will  complete  two  different  cognitive  tasks  and  two  different  physical  tasks  throughout  the 
survey.  The  two  cognitive  tasks  will  include  sorting  recycling  and  sorting  laundry.  The  two 
physical  tasks  will  include  moving  boxes  from  one  location  to  another  and  changing  a  light  bulb 
(Eigure  4). 
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Eollowing  participant  recruitment,  the  experimenter  will  email  personalized  Qualtrics 
links  to  participants  in  order  for  them  to  complete  a  unique  version  of  the  factorial  survey.  The 
survey  will  be  completed  in  their  home  so  no  lab  visit  is  necessary.  Participants  may  work 
through  the  survey  at  their  own  pace.  However,  they  will  be  instructed  to  complete  the  survey  in 
one  sitting.  In  the  survey,  participants  will  complete  a  demographics  form  along  with  the 
vocabulary,  perceptual  speed,  and  other  individual  difference  measures.  Afterward,  participants 
will  view  randomly  presented  vignettes  and  answer  each  question  after  the  completion  of  the 
video.  After  making  their  trust,  causal  attribution,  and  capabilities  ratings,  participants  will  be 
asked  to  briefly  explain  their  ratings.  Participants  will  complete  the  CPRS  at  the  conclusion  of 
the  survey.  Einally,  participants  will  be  debriefed  and  compensated  for  their  time. 

Anticipated  Results 

Eirst,  outliers  will  be  eliminated  from  the  data.  An  outlier  will  be  defined  as  a  participant 
that  scored  more  or  less  than  3  standard  deviations  from  the  mean  on  a  certain  measure.  In  order 
to  investigate  whether  manipulating  a  robot’s  appearance,  task,  and  reliability  had  an  effect  on 
the  level  of  trust,  causal  attributions,  and  perception  of  capabilities,  we  will  use  a  2  (participant 
age:  younger,  older)  X  2  (robot  age:  young,  old)  X  2  (task  domain:  cognitive,  physical)  X  2 
(robot  reliability:  low,  high)  repeated  measures  analysis  of  variance  (ANOVA)  for  subjective 
trust,  causal  attributions,  and  perceived  capabilities.  We  expect  to  see  main  effects  of  robot 
appearance  such  that  when  the  robot  appears  older,  trust  will  be  lower,  causal  attributions  will  be 
more  dispositional,  and  capability  of  perceptions  will  be  reduced.  It  is  also  hypothesized  that 
there  will  be  a  significant  main  effect  of  reliability  such  that  when  reliability  is  low,  trust  and 
capabilities  should  decrease  and  attributions  will  become  more  dispositional.  Although  task 
domain  will  be  treated  as  an  exploratory  variable,  a  main  effect  of  task  domain  is  hypothesized 
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such  that  trust  and  perceived  capabilities  will  be  highest  when  the  robot  performs  physical  tasks. 
We  expect  a  2  way  interaction  between  reliability  and  robot  age  such  that  when  reliability  is  low, 
trust  in  the  older  adult  automation  may  be  lowest  (Eigure  5).  Next,  we  expect  a  participant  age  by 
robot  appearance  by  reliability  interaction  on  causal  attributions  such  that  causal  attributions  will 
be  most  dispositional  in  older  adult  participants  when  robot  appearance  is  older  and  performance 
is  unreliable  (Eigure  6).  Einally,  we  expect  that  older  adult  participants  will  make  more 
dispositional  attributions  across  conditions  and  to  have  lower  trust  levels  overall. 

Discussion 

This  study  offers  a  unique  contribution  by  investigating  a  well-researched  paradigm  from 
the  social  cognition  and  aging  literatures,  stereotypes,  and  applying  it  to  a  novel  field,  HRI.  If  our 
hypotheses  are  supported  and  appearance  of  the  robot  has  an  effect  on  the  levels  of  trust, 
attribution,  and  perceived  capabilities  of  robots,  then  this  data  could  be  useful  for  informing 
future  design  of  robotics.  Eor  example,  the  results  of  people’s  judgments  based  on  task  domain 
may  suggest  if  certain  types  of  anthropomorphic  aids  are  only  appropriate  in  certain  domains. 

Eor  example,  it  may  not  be  appropriate  to  have  an  older  looking  robot  in  manufacturing  roles  that 
perform  gross  motor  tasks  such  as  heavily  lifting,  due  to  the  influence  its  appearance  may  have 
on  workers  perceptions  of  its  abilities  and  their  trust  in  the  system.  This  study  can  also  help 
influence  design  in  the  sense  that  it  further  investigates  which  factors  influence  trust  in 
automation.  If  the  goal  is  to  maximize  human  trust,  then  it  may  be  beneficial  to  use  younger 
looking  anthropomorphism  rather  than  older,  while  keeping  reliability  high.  Overall,  human- 
robot  collaboration  will  become  more  common  in  the  home  as  well  as  in  work,  thus  it  becomes 
critical  to  better  understand  how  people  perceive  such  technologies. 
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Eigure  1.  Young-adult  appearance  condition 


Eigure  2:  Older-adult  appearance  condition 
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Eigure  3:  Example  of  Baxter  stimuli  (older-adult  appearance  condition) 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


AGE  STEREOTYPES  IN  HRI 


24 


Domain 


Cognitive 


Physical 


Sorting  recycling  Sorting  laundry  Moving  boxes  Changing  light  bulb 

Eigure  4 
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Eigure  5:  Reliability  X  robot  age  interaction  on  subjective  trust 
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Participant  Age:  YA 
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Eigure  6:  Participant  age  X  robot  appearance  X  reliability  on  causal  attributions 
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Abstract 

Complacency  refers  to  a  type  of  automation  use  expressed  as  insufficient  monitoring  and 
verification  of  automated  functions.  Previous  studies  have  attempted  to  identify  the  age-related 
factors  that  influence  complacency  during  interaction  with  automation.  However,  little  is  known 
about  the  role  of  age-related  differences  in  working  memory  capacity  and  its  connection  to 
complacent  behaviors.  The  current  study  aims  to  examine  whether  working  memory  demand  of 
an  automated  task  and  age-related  differences  in  cognitive  ability  influence  complacency.  Higher 
degrees  of  automation  (DO A)  have  been  shown  to  reduce  cognitive  workload  and  may  be  used 
to  manipulate  working  memory  demand  of  a  task.  Thus,  we  hypothesize  that  a  lower  DOA  (i.e. 
information  acquisition  stage  with  lower  level)  will  demand  more  working  memory  than  a  higher 
DOA  (i.e.  decision  selection  stage  with  higher  level).  Older  adults  are  expected  to  have  a  greater 
tendency  to  become  complacent  under  a  low  DOA  and  younger  adults  are  expected  to  have  a 
greater  tendency  to  become  complacent  under  a  high  DOA. 
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Introduction 

By  the  year  2050,  the  number  of  older  adults  (age  65  and  over)  in  the  world  is  estimated 
to  reach  approximately  1.5  billion  (WHO,  201 1).  A  host  of  automated  services  and  devices  are  or 
will  be  designed  to  help  older  adults  maintain  independence  (e.g.,  medication  reminder  apps). 
Despite  this  availability  of  automation  and  its  seemingly  utility  to  maintain  independent  living 
(Haigh  &  Yanco,  2002),  research  has  shown  that  older  adults  may  be  more  complacent  with 
automated  systems  compared  to  younger  age  groups  (so  called  automation-induced 
complacency). 

Automation-induced  complacency  is  the  “self-satisfaction  that  may  result  in  non¬ 
vigilance  based  on  an  unjustified  assumption  of  satisfactory  system  state”  (Billings,  Eauber, 
Eunkhouser,  Eyman,  &  Huff,  1976).  It  is  the  state  in  which  a  user  fails  to  notice  imperfect 
automation.  When  the  user  poorly  monitors  the  system  and  does  not  detect  a  fault,  performance 
consequences  can  result  (Parasuraman  &  Manzey,  2010).  Eor  example,  an  older  adult  with 
diabetes  may  monitor  their  blood  glucose  levels  with  an  automated  tool.  If  the  older  adult 
perceives  the  device  as  reliable  and  trusts  that  the  blood  glucose  readings  are  accurate,  they  may 
rely  on  the  reading  even  when  the  device  starts  to  falter.  As  older  adults  begin  to  adopt 
automated  technologies,  it  is  important  to  understand  the  age-related  factors  that  contribute  to 
increased  complacency  and  the  performance  costs  associated  with  those  behaviors. 

Older  Adults,  Working  Memory,  and  Complacency 

Older  adults  have  been  found  to  be  more  complacent  with  automation  relative  to  younger 
adults  (Ho  et  ak,  2005b).  Various  studies  have  suggested  several  possible  explanations  for  older 
adults  increased  complacency.  Some  person-related  variables  range  from  issues  such  as  higher 
levels  of  trust  (Johnson,  Sanchez,  Eisk,  &  Rogers,  2004;  Pak,  Pink,  Price,  Bass,  &  Sturre,  2012), 
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or  age-related  differences  in  abilities  (e.g.,  working  memory;  Ho  et  al.,  2005b)  while  some 
system-related  variables  are  reliability  of  the  automation  (Sanchez,  Eisk,  &  Rogers,  2004;  Mayer, 
2008;  Olson,  Eisk,  &  Rogers,  2009),  cost  of  error  (Ezer,  2006;  Ezer,  Eisk,  &  Rogers,  2008),  cost 
of  verification  (Ezer,  Eisk,  &  Rogers,  2007;  Ezer  et  ak,  2008),  expectations  of  system 
performance  (Mayer,  2008),  and  workload  (McBride,  Rogers,  &  Eisk,  2011). 

Research  investigating  age  differences  in  cognitive  ability  as  a  possible  explanation  for 
changes  in  complacency  found  that  in  an  automated  task,  older  adults  relied  more  on  the 
automation,  committed  more  errors,  had  greater  trust  in  the  system,  and  were  less  confident  in 
their  own  abilities  compared  to  younger  adults  (Ho  et  ak,  2005b).  Also,  the  task  exerted  high 
demand  on  participants’  working  memory,  which  is  defined  as  the  amount  of  information  that 
can  be  held  in  the  mind  or  kept  accessible  at  one  time  (Cowan,  2004).  At  the  conclusion  of  each 
study  session.  Ho  et  ak  (2005b)  had  participants  recite  information  from  the  task  and  found  that 
greater  recall  accuracy  was  correlated  with  fewer  automation-related  errors.  Based  on  their 
findings,  they  concluded  that  age-related  differences  in  working  memory  might  be  a  potential 
reason  for  age  differences  in  complacency  due  to  the  memory  dependent  automated  task.  The 
researchers  proposed  that  because  the  younger  adults  could  actively  store  and  recall  task 
information  when  needed,  they  could  more  easily  identify  an  automation  failure  compared  to 
their  older  counterparts. 

Researchers  theorized  there  are  two  main  factors  that  contribute  to  older  adults’ 
complacent  behavior  with  automated  technologies  (Ho,  Kiff,  Plocher,  &  Haigh,  2005a).  The  first 
is  that  while  using  automation,  older  adults  form  an  inaccurate  mental  representation  of  the 
correct  values  used  in  the  decision  making  process  due  to  reduced  working  memory  capacity. 
Working  memory  has  been  found  to  be  a  critical  determinant  in  mental  model  acquisition 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


AGE  AND  AUTOMATION-INDUCED  COMPEACENCY 


5 


(Gilbert  &  Rogers,  1999),  where  having  an  accurate  mental  model  of  the  automation  allows  an 
individual  to  better  understand  the  behavior  of  the  system.  When  older  adults  acquire  an 
inaccurate  mental  representation  of  the  automation,  they  should  fail  to  anticipate  and  notice  the 
presence  of  system  failures.  The  second  is  that  due  to  their  reduced  working  memory  capacity, 
older  adults  are  unable  to  judge  the  accuracy  of  automation  (Ho  et  al.,  2005a).  Diminished 
working  memory  may  prevent  users  from  keeping  track  of  an  accurate  summation  of  automation 
failures.  If  lower  working  memory  of  older  adults  inhibits  detection  of  automation  failures  or 
active  recall  of  previously  encountered  failures,  the  user  will  have  a  distorted  view  of  system 
reliability.  When  older  adults  perceive  automation  as  more  reliable  than  it  is,  they  should  rely 
more  and  verify  less  (i.e.  increased  complacency). 

In  both  cases,  it  is  assumed  older  adults  relative  complacency  with  automation  is  due  to  a 
mismatch  between  the  working  memory  demands  of  the  task  and  working  memory  capacity  of 
the  person  (Ho  et  al.,  2005a).  If  working  memory  capacity  plays  such  a  central  role  in 
automation  complacency,  we  should  observe  the  opposite  relationship  as  well:  reduced 
complacency  in  older  adults  when  the  automation  has  been  designed  to  demand  relatively  less 
working  memory  resources  (or  working  memory  resources  are  less  constrained).  The  design  of 
Ho  et  al.’s  (2005b)  study  precludes  this  determination  because  it  is  unclear  whether  the  high 
working  memory  demands  of  the  task  or  the  degree  of  automation  (DOA)  contributed  to  the 
difference  in  complacency. 

In  sum,  several  lines  of  research  seem  to  point  to  the  importance  of  individual  and  age- 
related  differences  in  working  memory  on  automation  behavior,  particularly  complacency.  The 
research  shows  that  older  adults  are  less  sensitive  to  automation  failures  (McCarley,  Wiegmann, 
Wickens,  &  Smith,  2002)  and  frequently  rely  on  the  automated  system  when  these  malfunctions 
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occur  (Ho  et  al.,  2005b).  Older  adults  have  greater  trust  in  automation,  even  when  the  system  is 
faulty  to  varying  degrees  (Mayer,  2008).  They  have  lower  working  memory  capacity,  which 
decreases  the  ability  to  retain  knowledge  about  previous  automation  failures  and  overall  system 
reliability.  When  working  memory  demands  are  high  (or  working  memory  capacity  is 
constrained),  complacency  seems  to  increase. 

How  Complacency  is  Influenced  by  Automation-Related  Eactors 

Reliability 

Automation  reliability  is  the  overall  accuracy  of  the  system  and  is  an  important  factor  of 
automation-induced  complacency  because  the  number  of  errors  it  produces  can  impact 
dependence  on  automation. 

Across  different  levels  of  reliability,  age  is  known  to  produce  increased  effects  on  trust  in 
automation.  Eor  instance,  several  studies  found  that  higher  reliability  led  to  higher  subjective 
trust  in  the  system  for  both  age  groups,  but  older  adults  had  significantly  higher  trust  than 
younger  adults  (Sanchez  et  al.,  2004;  Ho  et  al.,  2005b).  Highly  reliable  automation  is 
problematic  because  users  can  become  accustomed  to  its  high  level  of  performance  and  may  not 
expect  it  to  fail. 

Research  on  age  differences  in  automation  use  has  found  that  older  adults  tend  to 
overestimate  the  actual  automation  reliability  (Olson  et  al.,  2009).  With  known  differences  in 
working  memory,  older  adults  have  difficulty  detecting  errors  and  perceiving  overall  automation 
performance.  A  combination  of  unnecessarily  high  trust  in  the  system  and  a  lack  of  working 
memory  may  produce  a  lack  of  error  prone  awareness  consistent  with  complacent  behavior. 
Workload 
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The  workload  or  demand  of  a  task  can  be  taxing  on  an  individual’s  cognitive  resources, 
especially  when  a  task  is  performed  over  a  long  period  of  time.  Greater  complacency  has  been 
shown  in  a  multitask  environment  instead  of  a  single  task  or  monitoring  role  for  younger  adults 
(Parasuraman,  Molloy,  &  Singh,  1993).  Increased  task  demands  can  burden  the  use  of  cognitive 
resources  and  can  limit  the  ability  to  maintain  optimal  manual  performance.  In  order  to  alleviate 
cognitive  workload,  the  user  can  increase  dependence  on  automation.  If  the  individual  has  access 
to  greater  cognitive  resources,  they  may  be  able  to  limit  their  dependence  on  automation.  Since 
older  adults  have  limited  cognitive  resources,  the  effect  of  task  demand  on  complacency  should 
become  greater  as  individuals  age. 

Under  taxing  conditions,  older  adults  have  a  greater  tendency  to  monitor  automation,  yet 
fail  to  correctly  identify  automation  errors  (Ho  et  al.,  2005b).  Exerting  more  cognitive  resources 
to  complete  a  task  may  lead  the  user  to  rely  on  automation  after  task  demands  become  too 
difficult  to  manage.  There  are  age  differences  in  complacency  that  have  occurred  under  high 
workload  conditions,  where  older  adults  display  greater  complacency  than  younger  adults 
(McBride,  2010;  Ho  et  al.,  2005b).  If  workload  only  partially  contributes  to  increases  in 
complacency,  other  age-related  factors  must  be  involved  as  well. 

Working  memory  capacity  has  been  found  to  significantly  predict  younger  adult 
performance  in  an  automated  task  with  varying  workload  (de  Visser,  Shaw,  Mohamed-Ameen, 

&  Parasuraman,  2010).  Since  working  memory  plays  a  role  in  predicting  performance,  this 
cognitive  ability  may  explain  some  age-related  differences  in  complacent  behaviors. 

Degree  of  Automation 

Automation  comes  in  a  variety  of  forms,  which  can  execute  different  functions  for  the 
user  based  on  their  capabilities  and  limitations.  However,  automation  is  not  simply  an  all  or  none 
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concept  because  any  individual  task  can  feature  varying  degrees  of  automation  (DOA)  that  take 
into  account  the  use  of  stages  and  levels  (Wickens,  Ei,  Santamaria,  Sebok,  &  Sarter,  2010). 

Parasuraman,  Sheridan,  and  Wickens  (2000)  identified  several  stages  of  automation  that 
are  based  on  an  existing  model  of  human  information  processing:  information  acquisition  (stage 
1),  information  analysis  (stage  2),  decision  and  action  selection  (stage  3),  and  action 
implementation  (stage  4).  Each  stage  is  designed  to  support  a  different  aspect  of  the  cognitive 
process.  Eor  example,  an  individual  with  an  unknown  illness  may  input  their  symptoms  into 
automated  decision  support  tool  to  obtain  a  diagnosis.  With  a  lower  stage  of  automation,  all 
possible  illnesses  related  to  those  symptoms  would  be  provided  and  the  user  would  make  a 
decision  based  on  all  the  options  listed.  On  the  other  hand,  a  higher  stage  of  automation  would 
have  the  decision  support  tool  provide  the  user  with  one  or  several  optimal  choices  in  order  to 
make  the  selection  process  more  efficient. 

Eevels  of  automation  differ  from  stages  because  they  affect  the  role  of  humans  and 
automated  systems  in  a  given  task.  These  levels  exist  on  a  spectrum  of  automation,  where  each 
level  between  manual  and  fully  automated  changes  the  designation  of  authority  for  decision¬ 
making  tasks.  A  low  level  of  automation  grants  authority  to  the  human,  making  the  person 
primarily  responsible  for  performing  the  task.  In  this  case,  the  individual  with  the  decision 
support  tool  would  be  given  little  to  no  guidance  and  would  have  to  choose  the  best  option  based 
on  the  information  provided.  The  roles  are  reversed  under  a  high  level  of  automation,  where  the 
automation  has  more  authority  to  make  decisions  for  the  user  and  complete  the  task.  Eor 
instance,  the  decision  support  tool  might  take  the  symptoms  entered  by  the  user  and  present  them 
an  ideal  diagnosis. 
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Along  each  stage  of  automation,  varying  levels  can  be  applied  to  achieve  a  lower  or 
higher  DOA.  More  automation  or  a  greater  DOA  can  be  achieved  with  both  higher  levels  within 
a  stage  and  later  stages  (Manzey,  Reichenbach,  &  Onnasch,  2012).  Also,  higher  DOAs  are 
associated  with  greater  performance  in  addition  to  diminished  workload  (Wickens  et  ak,  2010). 
Since  workload  is  reduced  under  a  higher  DOA,  the  automation  is  taking  on  more  of  the 
cognitive  demand  for  those  tasks  than  the  user.  This  leaves  the  user  with  more  cognitive 
resources  at  higher  DOAs.  Thus,  working  memory  demands  should  lessen  as  the  user  moves 
from  a  lower  DOA  towards  a  higher  DOA. 

Higher  complacency  can  take  the  form  of  performance  detriments  under  unreliable 
systems  and  performance  gains  for  increasingly  reliable  automation.  Eor  instance,  a  meta¬ 
analysis  found  that  higher  DOAs  lead  to  greater  accuracy  for  younger  adults,  but  only  when  the 
automation  performed  optimally  (Onnasch,  Wickens,  &  Manzey,  2013).  However,  there  was  a 
greater  performance  cost  for  imperfect  automation  as  DOA  increased.  Eor  younger  adults,  these 
findings  reveal  differences  in  performance  across  DOAs,  which  seem  to  indicate  changes  in 
complacent  behavior.  In  this  context  of  comparing  performance  across  lower  and  higher  DOAs, 
research  on  the  older  adult  population  has  not  been  performed.  In  terms  of  research  by  Ho  et  al. 
(2005b),  it  is  still  unclear  whether  the  high  working  memory  demands  of  the  task  or  the  high 
DOA  contributed  to  age-related  differences  in  complacency. 

Current  Study 

The  current  study  will  further  examine  the  role  of  age-related  differences  in  working 
memory  and  automation-induced  complacency.  If  complacency  is  related  to  working  memory, 
then  altering  the  working  memory  demands  of  the  task  (or  varying  the  person’s  working  memory 
capacity)  should  affect  overall  dependence  on  automation.  Eortunately,  the  working  memory 
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demands  of  automation  are  related  to  how  much  information  in  the  automated  task  is  presented 
to  the  user  (i.e.  stage  of  automation)  and  the  amount  of  authority  allocated  to  the  human  or 
automation  within  the  task  (i.e.  level  of  automation)  (Parasuraman,  Sheridan,  &  Wickens,  2000; 
Sheridan  &  Verplank,  1978).  We  can  alter  the  working  memory  demands  of  the  task  by  altering 
the  DOA  presented  to  the  user.  Thus,  we  should  expect  to  observe  greater  age  related  differences 
in  complacency  at  degrees  that  increase  working  memory  demands  for  the  user.  Ho,  Wheatley, 
and  Scialfa  (2005b)  only  used  a  high  DOA  (with  concomitantly  high  working  memory  demands) 
to  examine  differences  in  complacency  between  younger  and  older  adults.  Therefore,  we  will  use 
two  DO  As  that  vary  in  working  memory  demand  in  order  to  investigate  the  effects  of  lower  and 
higher  based  DOAs  on  complacency.  Also,  we  will  examine  the  predictive  ability  of  working 
memory  capacity  at  each  DOA.  We  expect  that  working  memory  capacity  of  each  age  group  will 
be  relative  to  the  working  memory  demand  of  the  task.  Thus,  we  anticipate  working  memory 
capacity  to  be  more  predictive  of  performance  for  younger  adults  at  a  low  DOA  and  for  older 
adults  at  a  high  DOA. 

This  study  will  utilize  a  low-fidelity  targeting  simulation,  which  has  been  used  in  prior 
research  to  analyze  accuracy  and  speed  of  user  selections  during  interaction  with  DOAs  support 
(Rovira,  McGarry,  &  Parasuraman,  2007).  Since  higher  DOAs  have  been  linked  with  reduced 
cognitive  workload  (Onnasch  et  ak,  2013),  we  expect  participants  to  perform  better  under  higher 
DOA  (i.e.  decision  selection  stage  with  higher  level)  than  lower  DOA  (i.e.  information 
acquisition  stage  with  lower  level).  Based  on  existing  literature,  we  anticipate  a  main  effect  of 
age  group  on  task  accuracy  and  completion  time,  where  younger  adults  should  outperform  older 
adults.  We  can  infer  the  extent  to  which  participants  are  complacent  by  analyzing  their  pattern  of 
performance  at  different  reliability  levels.  A  greater  difference  between  performance  with 
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unreliable  and  reliable  automation  indicates  higher  complacency  because  the  user  is  relying 
heavily  on  the  system  without  monitoring  for  failures.  Therefore,  we  will  examine  task  accuracy 
for  unreliable  and  reliable  trials  across  DOAs  and  age  groups.  We  hypothesize  a  lower  DOA  will 
result  in  a  greater  complacency  for  older  adults  and  a  higher  DOA  will  result  in  greater 
complacency  for  younger  adults.  We  anticipate  this  result  because  the  high  demand  of  a  low 
DOA  should  limit  older  adults’  ability  to  verify  information  provided  by  the  automated  system. 

In  terms  of  the  high  DOA,  lower  task  demands  should  lull  younger  adults  into  depending  on  the 
system  instead  of  checking  for  errors. 

Method 

Participants 

Thirty-six  undergraduate  students  will  be  recruited  for  this  research  and  given  course 
credit  for  participation.  Thirty-six  older  adults  (ages  65-75)  from  the  local  area  will  be  recruited 
and  will  be  compensated  $25  for  their  time. 

Task 

The  tasks  for  this  study  will  be  adapted  from  prior  research  that  uses  an  automated 
system  in  the  context  of  a  low-fidelity  UAV  simulation  (Rovira  et  ak,  2007).  The  primary  task 
for  this  study  will  be  to  quickly  and  accurately  find  the  closest  combination  of  friendly  (green 
units)  and  enemy  units  (red  units)  in  terms  of  distance  apart  on  the  grid  (Eigure  1).  Automation 
will  be  presented  as  a  table,  which  will  display  the  distances  and  unit  combinations  needed  by 
participants  to  complete  the  primary  task.  The  secondary  task  will  consist  of  checking  for  a 
specific  call  sign  and  clicking  a  corresponding  button  when  it  appears  on  screen.  The  call  sign  is 
comprised  of  a  single  word  and  number  combination  (e.g.  Hunter-6).  The  program  will  randomly 
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alternate  between  14  different  call  signs  every  5  seconds  as  the  participant  completes  the  primary 
task. 

Participants  will  complete  blocks  of  trials,  where  each  block  will  consist  of  a  different 
DOA  and  workload  level  (Appendix  A).  The  DOA  manipulation  will  change  the  stage  and  level 
of  the  automation  table  used  in  the  task.  The  lower  DOA  will  use  the  information  acquisition 
stage,  which  presents  all  possible  friendly  and  enemy  unit  combinations  in  each  grid,  with  a  low 
level  of  automation  that  does  not  sort  the  information  in  any  meaningful  way.  The  higher  DOA 
will  use  the  decision  and  action  selection  stage,  which  will  present  the  top  3  friendly  and  enemy 
unit  combinations.  In  addition,  this  DOA  will  feature  a  high  level  of  automation  that  will  sort  the 
information  based  on  importance,  so  that  the  shortest  distance  combination  is  presented  at  the 
top.  The  workload  manipulation  will  change  the  number  of  units  presented  in  the  grid.  Eow 
workload  will  present  3  friendly  and  3  enemy  units,  while  high  workload  will  show  6  friendly 
and  6  enemy  units.  Each  combination  of  DOA  and  workload  will  be  presented  twice  for  a  total  of 
8  blocks  and  240  trials.  Participants  will  complete  the  DOA  and  workload  manipulation  pairings 
in  a  random  counterbalanced  order. 

The  overall  automation  reliability  will  be  set  at  80%,  which  is  above  the  threshold  for 
imperfect  reliability  acceptance  (Wickens  &  Dixon,  2007).  In  each  block  of  30  trials,  24  trials 
will  be  reliable  and  the  remaining  6  trials  will  be  unreliable.  An  unreliable  trial  will  contain 
inflated  distance  values  between  units  or  incorrect  optimal  suggestions  within  the  automation 
support  table.  The  first  automation  failure  will  not  occur  until  the  10*  trial,  so  that  users  can 
rebuild  trust  after  each  block.  Also,  subsequent  automation  failures  will  be  distributed  randomly 
throughout  the  remaining  trials. 

Measures 
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Cognitive  Abilities.  The  following  abilities  will  be  assessed:  perceptual  speed  (digit- 
symbol  substitution;  Wechsler,  1997),  vocabulary  (Shipley  vocabulary;  Shipley,  1986),  and 
working  memory  (automated  operation  span  (Aospan);  Unsworth,  Heitz,  Schrock,  &  Engle, 
2005).  Instructions  for  the  Aospan  task  can  be  found  in  Appendix  B.  These  measures  were 
chosen  because  they  are  reliable  indicators  of  their  respective  abilities  (e.g.,  Czaja  et  ak,  2006). 
The  cognitive  ability  measures  were  selected  to  confirm  age  differences  in  fluid  and  crystalized 
intelligence.  Specifically,  the  Aospan  will  be  used  to  detect  age  group  differences  and  test  the 
predictive  ability  of  working  memory  capacity  on  performance  at  two  DOAs.  Research  has 
shown  the  Aospan  to  be  a  reliable  and  valid  indicator  of  working  memory  capacity  (Unsworth  et 
ak,  2005).  This  version  of  the  Ospan  is  preferred  because  the  task  is  fully  computerized,  the 
participant  can  complete  the  task  independently  of  the  experimenter,  and  the  experimenter  can 
collect  data  from  several  participants  simultaneously.  In  the  Aospan  task,  participants  will  be 
instructed  to  complete  simple  math  problems  while  remembering  the  order  of  individual  letters 
that  will  be  presented  after  solving  each  problem.  Participants  will  need  to  correctly  answer  at 
least  85%  of  the  math  problems  and  recall  as  many  letters  as  possible.  The  Aospan  score  will 
consist  of  the  sum  of  all  perfectly  recalled  letter  sets,  where  higher  scores  indicate  greater 
working  memory  capacity. 

Subjective  Workload.  Subjective  workload  will  be  measured  with  the  NASA-Task  Eoad 
Index  (NASA-TEX)  (Prichard,  Bizo,  &  Stratford,  201 1).  A  computer  version  of  the  task  will 
present  6  items  that  constitute  overall  workload:  mental  demand,  physical  demand,  temporal 
demand,  performance,  effort  and  frustration.  Each  item  is  rated  on  a  Eikert  scale  of  0  to  20, 
where  higher  values  indicate  increased  workload.  Subjective  workload  will  be  calculated  as  the 
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average  of  the  6  combined  items.  The  NASA-TEX  was  chosen  as  a  manipulation  check  for 
automation  stage  and  age  differences  in  perceived  workload. 

General  Trust  in  Automation.  Trust  towards  everyday  automation  will  be  measured  with 
a  survey  developed  by  Jian,  Bisantz,  and  Drury  (2000)  (Appendix  C).  This  measure  is  a  12-item 
survey  that  is  rated  on  a  Eikert  scale  of  1  (not  at  all)  to  7  (extremely).  The  first  5  questions  are 
negatively  framed  and  the  last  7  are  positively  framed.  Trust  is  the  sum  of  normal  and  reverse 
coded  responses,  for  a  possible  total  score  of  84.  Higher  scores  on  this  measure  indicate  greater 
trust  in  the  automated  system.  The  measure  will  be  analyzed  for  age-related  differences  in  trust 
towards  automation. 

Subjective  Trust.  We  will  use  a  survey  adapted  from  Eee  and  Moray  (1992)  to  measure 
subjective  trust  specifically  towards  each  DO  A  and  working  memory  manipulation  (Appendix 
D).  This  trust  measure  will  pose  4  questions,  rated  from  0  (not  at  all)  to  100  (extremely),  about 
the  automated  aid  used  in  each  set  of  trials.  Eor  example,  the  questions  will  ask  participants  to 
answer  how  much  they  trusted,  relied  upon,  or  benefited  from  using  the  automated  aid.  The 
overall  score  will  consist  of  the  sum  of  average  scores  on  questions  1 ,  2,  and  4,  where  higher 
scores  will  indicate  higher  trust.  Additionally,  this  questionnaire  will  be  used  to  examine  trust 
differences  between  age  groups,  workload,  and  DOA. 

Complacency  Potential.  The  Complacency  Potential  Rating  Scale  (CPRS)  measures 
individual  potential  complacency  behavior  (Singh,  Molloy,  &  Parasuraman,  1993)  (Appendix  E). 
This  20-item  scale  contains  4  filler  items  and  is  rated  on  a  Eikert  scale  of  1  (strongly  disagree)  to 
5  (strongly  agree).  The  CPRS  score  is  a  sum  of  the  remaining  responses,  where  higher  values  on 
this  measure  indicate  an  increased  complacency  potential.  The  CPRS  was  selected  in  order  to 
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predict  participant  complacency  within  the  task.  Also,  the  measure  serves  to  verify  age 
differences  in  complacency  potential. 

Design 

The  current  study  is  a  2  (age  group:  young  or  old)  x  2  (DOA:  low  or  high)  x  2 
(automation  reliability:  unreliable  or  reliable)  x  2  (workload:  low  or  high)  mixed-subjects  design. 
Age  group  will  be  a  between-subjects  independent  variable.  These  groups  will  differ  in  working 
memory  capacity  because  older  adults  have  been  shown  to  have  less  of  this  ability  than  younger 
adults.  DOA,  automation  reliability,  and  workload  will  be  within-subjects  independent  variables. 
The  DOAs  serve  as  our  working  memory  demand  manipulation. 

The  dependent  variables  will  be  targeting  task  accuracy,  targeting  task  completion  time, 
complacency  potential,  subjective  tmst,  subjective  workload,  general  trust  in  automation,  and 
working  memory  capacity.  Targeting  task  accuracy  will  be  measured  by  the  mean  rate  of 
optimal  responses  for  each  automation  block.  An  optimal  response  is  the  identification  of  the 
closest  pair  of  friendly  and  enemy  units  on  the  targeting  task  grid.  Targeting  task  time  will  be 
measured  by  the  average  duration  (in  milliseconds)  it  takes  participants  to  complete  each  trial. 
Complacency  potential  will  be  comprised  of  scores  on  the  CPRS.  Subjective  trust  will  be 
measured  by  the  sum  of  subjective  ratings  on  the  trust  questionnaire  for  each  combination  of 
DOA  and  workload  level.  Subjective  workload  will  consist  of  an  average  of  the  6  items  on  the 
NASA-TEX  and  will  be  measured  for  each  combination  of  DOA  and  workload  level.  General 
trust  in  automation  will  be  measured  with  the  corresponding  scale  based  on  ratings  of  trust 
towards  everyday  automated  technologies.  Working  memory  capacity  will  be  measured  as  the 
sum  of  perfectly  recalled  sets  of  letters  on  the  Aospan  task. 

Procedure 
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Participants  will  be  seated  at  individual  PC-computers  and  provided  with  informed 
consent.  They  will  be  instructed  to  complete  the  demographics  form  and  the  cognitive  ability 
measures.  The  experimenter  will  then  tell  participants  to  open  and  observe  the  targeting  task 
instructions  screen.  Participants  will  be  told  the  following:  “In  this  experiment,  you  will  have 
two  tasks.  The  first  task  will  be  to  monitor  the  communications  panel  for  the  call  sign  Hunter-6. 
When  you  see  Hunter-6,  you  should  click  the  answer  button.  The  second  task  will  be  to  target 
enemy  units  with  the  closest  friendly  unit  as  quickly  as  you  can.  You  will  do  this  by  first 
selecting  a  friendly  unit  from  the  list  of  buttons  in  the  targeting  input  and  then  select  an  enemy 
target  from  the  list  of  buttons  and  click  ok.  The  computer  aid  will  sometimes  help  you  with  this 
task  by  showing  you  the  distances  between  friendly  and  enemy  units.  Sometimes,  two  sets  of 
targets  will  have  the  same  distance.  In  this  case,  you  will  pick  the  one  with  the  shortest  distance 
to  the  headquarters.  Sometimes  the  computer  aid  will  give  you  lots  of  information,  other  times  it 
will  give  you  much  less  information.  The  computer  aid  can  be  very  reliable  but  it  is  not  perfect 
all  the  time.”  After  these  instructions,  the  experimenter  will  answer  questions  before  the 
participants  begin  the  task. 

As  the  participants  complete  the  tasks,  the  units  in  the  grid  and  the  values  within 
automation  table  will  change  for  each  subsequent  trial.  Between  each  block  of  trials,  participants 
will  fill  out  the  NASA-TEX  and  a  brief  subjective  trust  measure.  During  the  experiment,  a  screen 
will  appear  to  indicate  when  participants  linger  too  long  on  a  particular  trial.  If  participants  do 
not  input  friendly  and  enemy  unit  combinations  within  the  set  time  limit,  the  program  will 
automatically  continue  to  the  next  trial.  Younger  adults  will  have  10  seconds  to  complete  each 
trial,  while  older  adults  will  have  20  seconds.  Older  adults  will  have  more  time  for  the  task 
because  of  normative  age-related  differences  in  psychomotor  speed  (Salthouse,  1985).  Time 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


AGE  AND  AUTOMATION-INDUCED  COMPEACENCY 


17 


limits  were  based  on  an  analysis  of  incomplete  trials  from  pilot  testing  the  task  with  each  age 
group. 

Participants  will  proceed  through  each  block  of  trials  and  the  computer  will  notify  them 
when  they  are  finished.  When  they  complete  the  automation  program,  participants  will  be 
presented  with  a  general  subjective  measure  of  trust  in  automation  and  the  CPRS.  At  the 
conclusion  of  the  experiment,  participants  will  be  debriefed  and  provided  compensation  for  their 
time. 

Expected  Results 

To  begin  the  analysis,  outliers  will  be  eliminated  from  the  data.  An  outlier  will  be  defined 
as  a  participant  that  scored  greater  or  less  than  3  standard  deviations  from  the  mean  on  a 
particular  measure.  In  order  to  examine  the  differences  in  working  memory  demands  for  each 
DOA,  we  will  perform  regressions  of  working  memory  capacity  on  targeting  task  accuracy. 

Since  working  memory  capacity  has  already  been  found  to  predict  younger  adult  performance 
while  using  automation  (de  Visser  et  ak,  2010),  we  will  examine  the  slopes  of  younger  and  older 
adults  at  each  DOA.  We  expect  working  memory  capacity  to  be  more  predictive  of  task  accuracy 
for  younger  adults  at  a  low  DOA  (Eigures  2-3).  This  result  is  anticipated  because  lower  DO  As 
have  been  associated  with  greater  cognitive  workload  (Wickens  et  ak,  2010).  In  terms  of  a  high 
DOA,  we  expect  working  memory  capacity  to  be  more  predictive  of  task  accuracy  for  older 
adults. 

We  will  further  investigate  the  effect  of  our  manipulations  on  performance  by  conducting 
a  2  (age:  young  or  old)  x  2  (DOA:  low  or  high)  x  2  (workload:  low  or  high)  repeated  measures 
analysis  of  variance  (ANOVA)  for  targeting  task  accuracy  and  task  time.  We  expect  a  main 
effect  of  age  such  that  younger  adults  will  perform  the  task  quicker  and  more  accurately  than 
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older  adults.  We  expect  a  main  effect  of  DOA  such  that  performance  with  the  high  DOA  will  be 
significantly  greater  than  the  low  DOA.  We  anticipate  a  main  effect  of  workload,  where 
performance  under  low  workload  will  be  significantly  greater  than  high  workload.  Graphical 
representations  of  these  main  effects  can  be  found  in  Eigure  4  and  Eigure  5. 

In  order  to  examine  differences  in  complacent  behavior,  we  will  perform  a  2  (age:  young 
or  old)  X  2  (DOA:  low  or  high)  x  2  (automation  reliability:  reliable  or  unreliable  automation) 
repeated  measures  ANOVA  for  targeting  task  accuracy.  We  can  infer  the  extent  to  which 
participants  are  complacent  by  analyzing  their  pattern  of  performance  at  different  reliability 
levels.  A  greater  difference  between  performance  with  unreliable  and  reliable  automation 
indicates  higher  complacency  because  the  user  is  relying  heavily  on  the  system  without 
monitoring  for  failures.  Erom  the  analysis,  we  anticipate  a  3-way  interaction  such  that  the 
interaction  between  age  and  DOA  will  change  as  a  function  of  reliability  (see  Eigure  6  and 
Eigure  7). 

We  will  analyze  the  scores  on  each  subjective  measure  used  in  the  study.  We  will 
perform  a  2  (age:  young  or  old)  x  2  (DOA:  low  or  high)  repeated  measures  ANOVAs  to  analyze 
differences  in  subjective  trust  and  workload.  We  expect  a  main  effect  of  age,  where  older  adults 
will  report  greater  workload  and  trust  than  younger  adults.  We  expect  a  main  effect  of  DOA  such 
that  the  higher  DOA  will  produce  greater  subjective  trust  and  diminished  workload.  Graphical 
representations  of  these  main  effects  can  be  found  in  Eigure  8  and  Eigure  9.  Additional  measures 
including  complacency  potential  and  general  trust  in  automation  will  be  analyzed  with 
independent  samples  t-tests  to  compare  scores  across  age  groups.  We  expect  that  older  adults 
will  have  greater  complacency  potential  and  greater  overall  trust  in  automation  than  younger 
adults. 
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Discussion 

It  is  important  to  understand  the  factors  that  contribute  to  complacent  behaviors  within 
the  human-automation  interaction.  Eor  the  design  of  automated  systems,  it  is  necessary  to 
consider  factors  such  as  reliability  and  workload.  Since  high  system  reliability  is  common  in 
most  automated  technologies  today  and  thus  makes  users  more  susceptible  to  complacent 
behaviors,  it  is  essential  to  alert  the  user  to  potential  automation-related  failures  that  can  occur. 
In  terms  of  task  demands,  keeping  the  task  manageable  for  the  user  is  critical  for  detecting  and 
correcting  inaccuracies. 

Designers  should  select  the  appropriate  DOA  for  the  known  population  of  users. 
Specifically,  the  design  of  automated  tasks  should  consider  the  age  of  the  user.  Automation  can 
be  presented  in  many  different  ways  and  can  perform  a  wide  range  of  tasks  for  the  user. 
Depending  on  the  type  of  task,  some  forms  may  demand  more  working  memory  than  others. 
Eimiting  working  memory  demand  through  automation  can  be  beneficial  to  both  younger  and 
older  adults.  This  may  help  to  reduce  the  occurrence  of  complacent  behaviors  during  interaction 
with  automation. 
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Eigure  1.  Screenshot  of  targeting  task.  Eeatures  communications  panel  (top-left),  targeting  input 
panel  (top-left),  automation  table  (bottom-left),  and  grid  (right). 
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Eigure  2.  Einear  regression  between  working  memory  capacity  and  targeting  task  accuracy  (low 
DOA). 
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Eigure  3.  Einear  regression  between  working  memory  capacity  and  targeting  task  accuracy  (high 
DOA). 
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Eigure  4.  Graph  of  targeting  task  accuracy  for  each  age  group,  DOA,  and  level  of  workload. 
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Eigure  5.  Graph  of  targeting  task  time  for  each  age  group,  DO  A,  and  level  of  workload. 
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Eigure  6.  Graph  of  targeting  task  accuracy  for  younger  adult  participants. 
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Eigure  7.  Graph  of  targeting  task  accuracy  for  older  adult  participants. 
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Eigure  8.  Graph  of  subjective  trust  reported  at  each  DOA. 
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Eigure  9.  Graph  of  subjective  workload  reported  at  each  DOA. 
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Appendix  A:  Examples  of  DOA  and  Workload  Manipulations 


Reliable,  low  DOA,  and  low  workload  trial  example: 
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Reliable,  low  DOA,  and  high  workload  trial  example: 
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Reliable,  high  DOA,  and  low  workload  trial  example: 


Communications 

FREEDOM-6 


Answer 


Targetting 


Fhendly  (artillery) 


1 

1  "  1 

1  ^  1 

Enemy 

1 

OK 


DISTANCE 


Friendly  (A)  Enemy  |E)  eAtoE  A  to  HO 


A2 

E3 

003 

005 

A3 

E2 

003 

009 

A1 

E3 

004 

008 

Reliable,  high  DOA,  and  high  workload  trial  example: 


Communications 

ACCORN-6 


Answer 


Targetting 


Friendly  (artillery) 


A6 

Enemy 

E2 

E3 

E4 

E5 

E6 

OK 


0  I  S  TANC  E 


Friendly(A)  Er>emy(E)  AAtoE  AtoHO 


A6 

E5 

004 

006 

A1 

E1 

004 

011 

A3 

E3 

004 

014 
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Appendix  B:  Automated  Operation  Span  Task 
Phase  1:  Directions  for  Letter  Memorization  Practice  Phase 

•  In  this  experiment,  you  will  try  to  memorize  letters  you  see  on  the  screen  while  you  also 
solve  simple  math  problems. 

•  You  will  begin  by  practicing  the  letter  part  of  the  experiment. 

•  Eor  the  practice  set,  letters  will  appear  on  the  screen  one  at  a  time.  Try  to  remember  each 
letter  in  the  order  presented. 

•  After  2-3  letters  have  been  shown,  you  will  see  a  screen  listing  12  possible  letters. 

•  Your  job  is  to  select  each  letter  in  the  order  presented.  To  do  this,  use  the  mouse  to  select 
each  letter.  The  letters  you  select  will  appear  at  the  top  of  the  screen. 

•  When  you  have  selected  all  of  the  letters,  and  they  are  in  the  correct  order,  hit  the  DONE 
box  at  the  bottom  right  of  the  screen. 

•  If  you  make  a  mistake,  hit  the  CEEAR  button  to  start  over. 

•  If  you  forget  one  of  the  letters,  click  the  ?  (question  mark)  button  to  mark  the  spot  for  the 
missing  letter. 

•  Remember,  it  is  very  important  to  get  the  letters  in  the  same  order  as  you  see  them.  If  you 
forget  one,  use  the  ?  button  to  mark  the  position. 

•  Do  you  have  any  questions  so  far?  When  you’re  ready,  click  the  button  below  to  start  the 
letter  practice. 

Phase  2:  Directions  for  Mental  Math  Practice  Phase 

•  Now  you  will  practice  doing  the  math  part  of  the  experiment.  A  math  problem  will 
appear  on  the  screen  like  this:  (2*l)-i-l  =  ? 
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•  As  soon  as  you  see  the  math  problem,  you  should  compute  the  correct  answer.  In  the 
above  problem,  the  answer  3  is  correct. 

•  When  you  know  the  correct  answer,  you  will  click  the  OK  button  with  your  mouse. 

•  You  will  see  a  number  displayed  on  the  next  screen,  along  with  a  button  marked  TRUE 
and  a  button  marked  EAESE. 

•  If  the  number  on  the  screen  is  the  correct  answer  to  the  math  problem,  click  on  the  TRUE 
box  with  the  mouse.  If  the  number  is  not  the  correct  answer,  click  on  the  EAESE  box.  Eor 
example,  if  you  see  the  problem:  (2  *  2)  -i-  1  =  ?  and  the  number  on  the  following  screen 
is  5  click  the  TRUE  box,  because  the  answer  is  correct.  If  you  see  the  problem:  (2  *  2)  -i- 

1  =  ?  and  the  number  on  the  next  screen  is  6  click  the  EAESE  box,  because  the  correct 
answer  is  5,  not  6.  After  you  click  on  one  of  the  boxes,  the  computer  will  tell  you  if  you 
made  the  right  choice, 

•  It  is  VERY  important  that  you  get  the  math  problems  correct. 

•  It  is  also  important  that  you  try  and  solve  the  problem  as  quickly  as  you  can. 

•  Do  you  have  any  questions?  When  you’re  ready,  click  the  mouse  to  try  some  practice 
problems. 

Phase  3:  Directions  for  Combined  Letter  Memorization  and  Mental  Math  Phase 

•  Now  you  will  practice  doing  both  parts  of  the  experiment  at  the  same  time.  In  the  next 
practice  set,  you  will  be  given  one  of  the  math  problems. 

•  Once  you  make  your  decision  about  the  math  problem,  a  letter  will  appear  on  the  screen. 
Try  and  remember  the  letter. 

•  In  the  previous  section  where  you  only  solved  math  problems,  the  computer  computed 
your  average  time  to  solve  the  problems. 
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•  If  you  take  longer  than  your  average  time,  the  computer  will  automatically  move  you 
onto  the  next  letter  part,  thus  skipping  the  True  or  Ealse  part  and  will  count  that  problem 
as  a  math  error. 

•  Therefore,  it  is  VERY  important  to  solve  the  problems  as  quickly  and  as  accurately  as 
possible. 

•  After  the  letter  goes  away,  another  math  problem  will  appear,  and  then  another  letter. 

•  At  the  end  of  each  set  of  letters  and  math  problems,  a  recall  screen  will  appear.  Use  the 
mouse  to  select  the  letters  you  just  saw. 

•  Try  your  best  to  get  the  letters  in  the  correct  order.  It  is  important  to  work  QUICKLY  and 
ACCURATELY  on  the  math.  Make  sure  you  know  the  answer  to  the  math  problem 
before  clicking  to  the  next  screen. 

•  You  will  not  be  told  if  your  answer  to  the  math  problem  is  correct.  After  the  recall  screen, 
you  will  be  given  feedback  about  your  performance  regarding  both  the  number  of  letters 
recalled  and  the  percent  correct  on  the  math  problems. 

•  During  the  feedback,  you  will  also  see  your  percent  correct  for  the  math  problems  for  the 
entire  experiment. 

•  It  is  VERY  important  for  you  to  keep  this  at  least  at  85%. 

•  Eor  our  purposes,  we  can  only  use  data  where  the  participant  was  at  least  85%  accurate 
on  the  math. 

•  Therefore,  you  must  perform  at  least  at  85%  on  the  math  problems  WHILE  doing  your 
best  to  recall  as  many  letters  as  possible. 
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Appendix  C:  General  Rating  of  Trust  in  Automation 


Below  are  several  statements  about  the  targeting  aid  that  you  just  used  (referred  to  as  the  "system"). 

Please  rate  your  feelings  about  the  aid  from  "not  at  all"  to  "extremely"  (click  one  of  the  7  buttons  in  a  row  for  each  question). 
1.  The  system  is  deceptive  7.  The  system  provides  security 


1  Not  at  all 

2 

3 

4 

5 

6 

7  Extremely 

.  The  system  behaves  in  an  underhanded  manner 

1  Not  at  all 

2 

3 

4 

5 

6 

7  Extremely 

.  1  am  suspicious  of  the  system's  intent,  action,  or  outputs 

1  Not  at  all 

2 

3 

4 

5 

6 

7  Extremely 

4.  t  am  wary  of  the  system 


INot  at  all 

2 

3 

4 

5 

6 

7  Extremely 

5.  The  system's  actions  will  have  a  harmful  or  injurious  outcome 


1  Not  at  all 

7  Extremely 

* 

.  1  am  confident  in  the  system 

1  Not  at  all 

2 

3 

4 

5 

6 

7  Extremely 

12. 1  am  familiar  with  the  system 


1  Not  at  all 

2 

3 

4 

5 

6 

7  Extremely 
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Appendix  D:  Subjective  Trust  in  the  Automated  Aid 


To  what  extent  did  you  trust  (i.e.  believe  in  the  accuracy  of)  the 
automation  aid  in  this  scenario? 


<  til  y 


To  what  extent  did  you  rely  on  (i.e.  actually  use)  the  automation  aid  in 
this  scenario? 


4  III  > 


To  what  extent  were  you  self-confident  that  you  could  successfully 
perform  without  the  automation  aid  in  this  scenario? 


4  III  ^ 


To  what  extent  do  you  think  the  automation  improved  your 
performance  in  this  scenario  compared  to  performance  without  the 
automation? 

4  III  > 
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Appendix  E:  Complacency  Potential  Rating  Scale 


1.  Manually  sotting  through  card  catalogs  is  more  reliable  than  computer-aided  searches 

for  finding  items  in  a  library. 


2.  If  I  need  to  have  a  tumor  in  my  body  removed,  I  would  choose  to  undergo 
computer-aided  surgery  using  laser  technology  because  computerized  surgery  is  more 

reliable  and  safer  than  manual  surgery. 


3.  People  save  time  by  using  automatic  teller  machines  (ATMs)  rather  than  a  bank  teller  in 

making  transactions. 


4. 1  do  not  trust  automated  devices  such  as  ATMs  and  computerized  airline  reservations 

systems. 


5.  People  who  work  frequently  with  automated  devices  have  lower  job  satisfaction  because 
they  feel  less  involved  in  their  job  and  those  who  work  manually. 


6. 1  feel  safer  depositing  my  money  at  an  ATM  then  with  a  human  teller. 


7. 1  have  to  record  an  important  TV  program  for  a  class  assignment.  To  ensure  that  the 
correct  program  is  recorded,  I  would  use  the  automatic  programming  facility  on  my 
recording  device  rather  than  manual  taping. 


8.  People  whose  jobs  require  them  to  work  with  automated  systems  are  lonelier  than 
people  who  do  not  work  with  such  devices. 


9.  Automated  systems  used  in  modern  aircraft,  such  as  the  automatic  landing  system,  have 

made  their  journey  safer. 


10  ATMs  provide  a  safeguard  against  the  inappropriate  use  of  an  individual's  bank  account 

by  dishonest  people. 


11.  Automated  devices  used  in  aviation  and  banking  have  made  work  easier  for  both 

employees  and  customers. 


12. 1  often  use  automated  devices. 


13.  People  who  work  with  automated  devices  have  greater  job  satisfaction  because  they 
feel  more  involved  than  those  who  work  manually. 


14.  Automated  devices  in  medicine  save  time  and  money  in  the  diagnosis  and  treatment  of 

disease. 


15.  Even  though  the  automatic  cruise  control  In  my  car  is  set  to  a  speed  below  the  speed 
limit,  I  worry  when  I  pass  police  radar  speed-trap  in  case  the  automatic  control  is  not 

working  properly. 


16.  Bank  transactions  have  become  safer  with  the  introduction  of  computer  technology  for 

the  transfer  of  funds. 


17. 1  would  rather  purchase  an  item  using  a  computerthat  have  to  deal  with  the  sales 
representative  on  the  phone  because  my  order  Is  more  likely  to  be  correct  using  the 

computer. 


18.  Work  has  become  more  difficult  with  the  increase  of  automation  in  aviation  and 

banking. 


19. 1  do  not  like  to  use  ATMs  because  I  feel  that  they  are  sometimes  unreliable. 


20. 1  think  that  automated  devices  used  in  medicine,  such  as  CAT  scans  and  ultrasound, 

provide  very  reliable  medical  diagnosis. 
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